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Abstract 


This  research  examines  a  very  large-scale  integrated  (VLSI)  circuit  implementation 
of  the  Winograd  and  Good-Thomas  algorithms  for  computing  discrete  Fourier 
Transforms  (DFTs)  with  composite  blocklengths..  The  theoretical  background  for  calcu¬ 
lating  DFTs  in  general  is  developed,  before'' the  algorithms  of  interest  are  presented,  in 
detail.  Once  the  validity  of  the  algorithms  is  established,  a  VLSI  architecture,  which 
exploits  the  parallelism  and  pipelining  inherent  in  the  algorithms,  is  discussed.  Wino¬ 
grad  processors  use  both  the  small  and  large  Winograd  algorithms  to  compute  DFTs 
with  blocklengths  of  15,  16,  and  17.  Longer  blocklength  DFTs  (240.  255,  272.  and  4080) 
are  computed  using  a  pipeline  of  Winograd  processors,  dual-port  memories,  and  an  inter¬ 
face  processor;  the  pipeline  uses  the  Good-Thomas  Prime  Factor  Algorithm  .(PFA). 

,  j  *' \  .  i 

Fault  tolerance  was  included  in  the  initial  design  of  the  VLSI  architecture.  Watchdog 
processors  check  both  data  and  addresses  of  active  Winograd  processors,  while  parity 
checking  circuits  incorporated  in  the  Winograd  processors  augment  memory  error- 
correction  coding  (EGC).' 

The  numerical  accuracy  of  the  VLSI  circuit,  was  determined  using  a  software  simu¬ 
lation.  The  signal- to- noise  ratio  (SNR)  was  used  as  the  accuracy  metric.  The  signal  was 
the  output  of  a  standard  module,  which  used  double-precision  arithmetic,  while  the  noise 
was  the  difference  between  the  standard  and  the  simulation  module  The  simulation 
module  used  integer  arithmetic  to  exactly  mimic  operation  of  the  VLSI  circuit.  The  out¬ 
puts  of  the  standard  module  were  also  compared  with  a  direct  evaluation  of  the  DFT  to 
verify  the  standard  module  did  compute  a  DFT  Results  of  the  comparison  between  the 
standard  and  simulation  modules  for  single-factor  DFTs  (15.  16.  and  17)  indicate  the 
VLbl  circuit  can  produce  information  accurate  enough  bar  synthetic  aperture  radar  and 
other  demanding  applications 
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Architecture  and  Numerical  Accuracy  of 
High-Speed  DFT  Processing  Systems 


Chapter  1 
Introduction 


1.1.  Background. 

Current  radar  and  image  recognition  systems  require  real-time  computation  of 
discrete  Fourier  transforms  (DFTs).  These  applications  need  spectral  information,  given 
bv  the  DFT,  on  a  large  number  of  sample  points  to  obtain  the  necessary  spectral  resolu¬ 
tion  for  precise  calculation  of  operations  such  as  correlation  and  convolution.  The  DFT 
is  used  since  only  a  finite  number  of  sampled  values  are  available,  rather  than  the  origi¬ 
nal  analog  signal.  The  DFT  should  not  be  computed  directly  because  the  number  of 
operations  would  be  proportional  to  the  square  of  the  number  of  sample  points  (i.e.. 
OfN")).  Instead,  the  class  of  algorithms  known  as  fast  Fourier  transforms  (FFTs)  is  usu¬ 
ally  employed,  since  the  number  of  operations  is  only  proportional  to  the  logarithm  of 
the  number  of  sample  points  times  the  number  sample  points  (i.e..  O(NlogN)).  The  first 
popular  FFT  algorithm,  the  Coolev-Tukey  radix-two  algorithm  5  ,  is  still  widely  used 
today.  The  algorithm,  developed  in  1965.  takes  advantage  of  symmetry  properties  within 
the  DFT  computation  to  reduce  the  number  of  operations. 

f  u t il  the  advent  of  very  large-scale  integrated  l VLSI)  circuits,  most  DFT  calcula¬ 
tions  were  performed  by  general-purpose  computers  or  by  banks  of  circuit  boards  con¬ 
taining  medium-  and  large-scale  integrated  (MSI  and  LSI)  circuits.  The  general-purpose 
computers  were  used  for  most  applications  because  they  had  better  accuracy  and 
throughput,  unless  size  and  power  requiremen ts  forced  the  use  of  integrated  circuits. 
The  loss  in  throughput  and  or  accuracy  when  integrated  circuits  were  used  meant  some 


jobs  had  to  be  processed  off-line,  rather  than  in  real  time  (e.g.,  synthetic  aperture  radar 
and  image  processing  from  space  vehicles).  It  is  now  possible,  using  VLSI  circuits,  to  put 
all  the  required  arithmetic  and  control  circuitry  necessary  to  perform  DFT  computations 
onto  a  single  chip.  However,  several  single-chip  processors  may  be  needed  for  those 
applications  which  require  long  blocklengths  and/or  high  throughput.  In  the  case  of 
synthetic  aperture  radar,  the  throughput  and  blocklength  constraints  are  so  severe,  off¬ 
line  optical  processing  continues  to  be  used  [18]. 

One  method  of  increasing  the  throughput  is  to  reduce  the  number  of  operations 
required  to  compute  the  DFT.  Winograd  has  shown  a  class  of  algorithms  (know  as 
VVinograd  Fourier  Transform  Algorithms;  WFTA)  to  use  the  fewest  number  of  multipli¬ 
cations  in  computing  the  DFT  [29j.  Reducing  the  number  of  multiplications  gives  a 
larger  increase  in  performance  than  reducing  the  number  of  additions  because  the  multi¬ 
plications  are  more  complicated,  requiring  several  additions  to  yield  the  product.  Thus, 
designing  an  integrated  circuit  which  implements  a  WFTA  would  be  likely  to  have  high 
throughput.  The  remaining  question  is  whether  the  circuit  can  provide  the  necessary 
accuracy  for  the  radar  and  image  processing  applications. 

1.2.  Problem  Statement. 

The  research  presented  in  this  thesis  has  two  goals:  1)  develop  an  architecture  for 
a  VLSI  circuit  which  computes  DFTs  using  the  WFTA;  and  2)  determine  the  numerical 
accuracy  of  the  VLSI  circuit  using  a  software  program  to  simulate  the  numerical  opera¬ 
tion  of  the  circuit. 

1.3.  Scope. 

This  thesis  report  is  the  first  of  a  series  of  four  reports  on  the  research  of  VLSI  cir¬ 
cuits  implementing  a  WFTA.  This  report  will  focus  on  the  numerical  simulation  of  the 
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VLSI  circuit  and  the  development  of  the  circuit  from  a  system-level  viewpoint.  The 
other  three  thesis  reports  will  cover  the  following  areas: 

1)  VLSI  arithmetic  circuitry; 

2)  VLSI  control  circuitry; 

3)  VLSI  circuit  simulation. 

A  summary  of  information  contained  in  the  other  three  reports  is  presented  in  the  fol¬ 
lowing  paragraphs. 


1.3.1.  VLSI  Arithmetic  Circuitry.  This  area  of  the  research  is  highlighted  in  the 
thesis  report  of  Captain  Paul  Coutee  [6j.  Captain  Coutee  discusses  the  modules  neces¬ 
sary  to  realize  the  multipliers  and  the  adder/ subtractor  elements  in  the  VLSI  circuit. 
The  design  of  the  modules,  including  optimizing  the  area,  is  presented  in  detail.  Also, 
the  requirements  of  the  arithmetic  circuitry,  from  both  a  systems  viewpoint  and  a  circuit 
viewpoint,  are  developed. 
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1.3.2.  VLSI  Control  Circuitry.  Captain  Paul  Rossbach  22!  describes  the  modules 
necessary  to  realize  the  control  portion  of  the  VLSI  circuit,  including  the  signals  which 
allow  the  circuit  to  compute  the  DFT  and  the  generation  of  addresses  for  both  input  and 
output  data.  A  ring  counter  and  a  programmed-logic  array  (PL.\|  are  used  to  generate 
the  the  control  signals.  The  address  circuitry  uses  a  read-only  memory  (ROM)  to  store 
the  addresses  for  the  input  and  output  data.  A  special  algorithm  to  reduce  the  number 
of  transistors  in  the  ROM  is  presented  in  derail. 


1.3.3.  VLSI  Circuit  Simulation.  A  functional  simulation  of  the  VLSI  circuitry  is 
presented  in  the  thesis  report  of  Captain  James  Collins  4.  Captain  Collins  discusses 
the  language  requirements  necessary  to  simulate  operation  <>f  a  1(>- point  Winograd 
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processor  in  detail,  implementing  such  functions  as  parity  checking  and  generation, 
rounding,  and  scaling.  A  simulation  in  the  ’C’  computer  language  is  given  in  detail, 
showing  the  flow  of  information  through  the  processor.  Also,  a  description  of  the  16- 
point  processor  is  presented  using  the  VHSIC  Hardware  Description  Language  (VHDL). 


1.4.  General  Approach. 

The  general  approach  to  developing  the  VLSI  circuits  which  perform  the  WFTA 
will  be  to  first  introduce  the  theory  of  how  Winograd’s  algorithms  allow  DFT  computa¬ 
tions  to  be  calculated  using  convolution  algorithms.  Then,  the  VLSI  circuits  will  be 
presented,  with  block  diagrams  of  individual  processors  and  DFT  systems  incorporating 
processors,  memories,  and  host  interfaces.  Finally,  the  numerical  simulation  of  the  cir¬ 
cuits  is  discussed,  including  differences  between  the  simulation  module  and  the  standard 
module  and  results  of  the  simulation. 


1.5.  Overview  of  Remaining  Chapters. 

The  remaining  chapters  in  this  thesis  report  will  follow  the  general  approach  out¬ 
lined  in  the  previous  paragraph.  Chapter  2  contains  the  necessary  theory  to  understand 
how  the  algorithms  presented  in  this  report  can  compute  DFTs.  First,  the  Coolev- 
Tukey  fast  Fourier  Transform  ( F F T 1  algorithm  is  given,  since  it  introduces  the  concept 
of  algorithms  which  use  fewer  operations  than  the  direct  evaluation  o|  the  DFT  Next, 
the  use  of  Winograd's  short  convolution  algorithm  is  presented,  showing  i  method  ol 
quickly  computing  a  cyclic  convolution  directly.  Then.  Rader  s  prime  algorithm  can  be 
used  to  change  a  DFT  calculation  into  a  cyclic  convolution  computation.  The  combina¬ 
tion  of  these  two  ideas  gives  rise  to  Winograd's  small  DFT  algorithm  For  longer  DFT 
blocklengths.  Winograd’s  large  DFT  algorithm  or  the  Good-Thomas  Prime  Factor  Algo¬ 
rithm  iPFA)  may  be  used  with  the  small  Winograd  modules. 
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Chapter  3  has  the  information  on  the  architecture  of  the  VLSI  circuit.  Beginning 
with  the  Winograd  processors,  the  circuits  are  presented  from  a  system-level  viewpoint, 
including  block  diagrams  and  control  signal  descriptions.  The  DFT  processor,  a  system 
of  Winograd  processors  and  associated  memory  and  interface  chips,  is  discussed  next. 
Finally,  the  characteristics  of  both  the  individual  processors  and  the  DFT  system  are 
presented.  Fault  tolerance  is  discussed  from  a  design  viewpoint  (i.e.,  incorporating  fault 
tolerance  into  the  original  design  of  the  circuits,  rather  than  as  an  additional  item  for 
later  development). 

Chapter  4  contains  the  information  on  the  numerical  performance  of  the  architec¬ 
ture  presented  in  Chapter  3.  First,  the  metric  used  to  determine  the  numeric  accuracy, 
the  signal-to-noise  ratio,  is  given,  as  well  as  some  of  the  main  sources  of  noise  (noise 
being  the  difference  between  the  results  from  the  standard  and  simulation  modules). 
Second,  the  details  of  the  programming  are  given,  especially  the  requirements  of  the 
language  to  be  used  and  the  differences  between  the  standard  and  simulation  modules. 
Lastly,  the  results  of  the  simulation  are  presented,  with  the  average  signal-to-noise  ratios 
given  for  three  DFT  blocklengths. 

Chapter  5  has  the  results,  conclusions,  and  recommendations  for  the  this  report. 
Future  research  will  be  centered  on  the  fabrication  and  testing  of  the  VLSI  circuits;  how¬ 
ever,  some  theoretical  work  should  be  accomplished  (i.e..  validation  of  the  17-point  algo¬ 
rithm)  and  efforts  should  be  made  to  determine  the  effects  of  coefficient  wordlength  and 
different  types  of  input  data  on  the  numerical  accuracy  The  appendices  contain  a 
development  of  the  lo-point  DFT  using  the  large  Winograd  algorithm,  the  simulation 
programs,  and  the  simulation  results. 


Chapter  2 
Theory 


2.1.  Overview 

The  material  in  this  chapter  describes  the  algorithms  used  to  compute  the  Discrete 
Fourier  Transform  (DFT)  and  presents  the  theoretical  background  necessary  to  under¬ 
stand  why  these  algorithms  were  chosen.  The  algorithms  used  are  Winograd’s  Small 
DFT  algorithm,  Winograd’s  Large  DFT  algorithm,  and  the  Good-Thomas  Prime  Factor 
Algorithm  (PFA).  Winograd’s  Small  DFT  algorithm  allows  efficient  computation  of 
DFTs  with  short  blocklengths  (e.g.,  3,  o,  16,  and  17).  These  short  blocklength  DFTs  are 
used  in  Winograd’s  Large  DFT  algorithm  and  the  Good-Thomas  PFA  to  compute  DFTs 
with  longer  blocklengths  (e.g.,  4080)  and  non-prime  blocklengths  (e  g.,  15).  The  informa¬ 
tion  in  this  chapter  is  presented  in  the  following  order.  First,  the  Cooley-Tukey  algo¬ 
rithm  for  computing  DFTs,  known  as  the  fast  Fourier  Transform  (FFT),  is  given  Then, 
the  method  of  computing  DFTs  from  cyclic  convolutions  is  described.  Next,  the  theory 
of  the  Winograd  modules  is  discussed.  Finally,  the  Good-Thomas  Prime  Factor  Algo¬ 
rithm  is  explained. 


2.2.  The  Cooley-Tukey  FFT  Algorithm. 

The  DFT  is  a  means  of  describing  the  discrete  frequency  components  of  a  finite 
sequence  of  values  17!.  The  DFT  may  be  viewed  as  the  Fourier  Transform  of  an 
infinite,  periodic  sequence  (where  the  original  sequence  is  one  period  of  the  infinite 
sequence)  or  as  sampled  values  of  the  Z  transform  of  the  original  finite  sequence.  The 
DFT  can  be  expressed  using  a  summation  form,  as  shown  in  (2-1 ) 


X (k)  =  2  *(»)  w.v  it  =0.  1.  .V- 1 


where  Wv  =  e 
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This  direct  evaluation  of  the  DFT  requires  4 JV*  real  multiplications  and  N  (AN  -  2)  real 
additions,  if  the  N  input  points  are  complex  samples,.  As  N  grows,  the  number  of 

o 

operations  grows  as  .V*.  Most  algorithms  which  reduce  the  number  of  operations  in 
DFT  computations  take  advantage  of  two  properties  of  the  coefficients  used  in  the  direct 
form  of  the  DFT.  The  first  property  is  conjugate  symmetry,  as  shown  in  (2-2). 
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Conjugate  symmetry  reduces  the  number  of  operations  by  about  one-half.  Also,  certain 
values  of  the  product  kn  yield  coefficients  of  0  or  1,  which  are  called  trivial  coefficients. 
The  other  property  which  FFT  algorithms  exploit  is  periodicity,  as  shown  in  (2-3). 


+N>  _  yy  (*  *JV'* 


Periodicity  provides  a  greater  reduction  in  the  number  of  operations  than  does  the  sym¬ 
metry  property. 

The  FFT  algorithm  achieves  the  dramatic  decrease  in  the  number  of  operations  by 
decomposing  the  DFT  computation  into  successively  smaller  computations.  The  decom¬ 
position  may  be  performed  in  either  the  time  (decimation  in  time)  or  the  frequency  (deci¬ 
mation  in  frequency)  domain.  Usually,  the  decimation  is  performed  when  .V  is  an  even 
integer.  Then,  the  single  summation  in  (2-1)  can  be  separated  into  two  summations,  one 
for  even  integers  and  one  for  odd  integers  (2-4). 
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The  symmetry  and  periodicity  properties  described  previously  can  be  applied  to  show 
the  two  summations  in  (2-4)  each  correspond  to  an  .V /'2-point  DFT.  Thus,  the  N /2~ 
point  DFT  is  computed  once,  then  combined  with  itself  to  give  an  Af-point  DFT.  The 
decimation  process  is  repeated  until  only  2-point  transforms  are  computed.  The  number 
of  decimations  is:  log2N ;  therefore,  the  number  of  multiplications  required  by  the 
decimation-in-time  FFT  algorithm  to  compute  the  DFT  is:  .V  '2. 

Even  though  the  number  of  operations  has  decreased  by  2JV  log2.V,  the  coefficients 
(VVv)  must  be  computed  for  each  of  the  JV/2  points.  Also,  only  two  data  words  may  be 
read  into  the  arithmetic  circuitry  for  each  multiplication /addition  operation  (often  called 
a  ’butterfly’).  These  two  characteristics  usually  provide  the  most  severe  limitation  of 
the  FFT  algorithms,  that  of  input/output  (I/O)  bandwidth.  The  I/O  bandwidth  relates 
to  the  rate  at  which  data  may  be  read  into  and  out  of  memory.  Thus,  even  with  the 
reduction  in  the  number  of  operations  given  by  the  FFT  algorithms,  there  are  still  some 
problems  which  limit  the  usefulness  of  the  algorithms. 

2.3.  Discrete  Fourier  Transforms  From  Cyclic  Convolutions. 

One  of  the  most  frequent  applications  of  the  DFT  is  to  compute  the  convolution  of 
two  finite  sequences.  The  convolution  is  a  result  of  passing  an  input  signal  through  a 
linear  filter.  The  mathematical  representation  of  a  filtering  operation  is  expressed  in  the 
time  domain  as  a  convolution  of  the  input  signal  with  the  impulse  response  of  the  filter. 
Often,  it  is  easier  to  perform  the  computations  in  the  frequency  domain,  where  the  filter¬ 
ing  operation  is  a  multiplication  of  the  spectrum  of  the  input  signal  with  the  frequency 
response  of  the  filter.  The  resulting  product  can  be  inverse-transformed  lor  time-domain 
analysis.  Since  only  a  finite  number  of  sampled  values  of  the  original  analog  signal  are 
available,  the  DFT  is  used,  rather  than  the  continuous-time  Fourier  transform  This 
method  of  computing  convolutions  is  chosen  because  computation  of  the  DFT  usuallv 


involves  fewer  operations  (multiplications  and  additions)  than  direct  evaluation  of  the 
convolution  itself.  However,  this  relationship  may  be  reversed;  a  cyclic  convolution  may 
be  used  to  compute  a  DFT  if  the  convolution  algorithm  requires  fewer  operations  than 
the  DFT  algorithm.  Winograd  [29)  has  shown  such  algorithms  do  exist. 


2.3.1.  Winograd’s  Short  Convolution  Algorithm.  The  following  development  of  methods 
to  compute  cyclic  convolutions  was  adapted  from  Blahut  [3l  and  Winograd  29j.  A  cyclic 
convolution  may  be  written  as; 

a  (x )  =  g(x)d(x)  mod\m(x )]  (2-5) 

where  the  degree  of  g(x)  and  d(x)  is  (N  -  l)  and  m(x)  =  x  -  1. 

The  coefficients  of  s(x)  are  expressed  as: 

.v-i 
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where  the  double  parentheses  denote  modulo  N  arithmetic. 

This  may  be  broken  into  several  smaller  computations  by  factoring  m(x)  into  K  rela¬ 
tively  prime  polynomials  ( i.e. ,  those  without  any  common  factors).  The  residues  of  the 
g(x)  and  d(x)  for  each  of  the  K  factors  are  computed. 
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where  R  i  represents  taking  the  residue  of  / 


Then,  the  kth  residue  of  s(x)  may  be  found  by: 


4(i)(2)  =  g{x)d(x)  mod[mik\x))  (2-8) 

=  mod  [  m1* *(x  )  j 


Finally,  s(x)  is  found  by  combining  the  K  residues: 

K-l 
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The  Winograd  short  convolution  algorithm  may  be  expressed  using  matrix  notation 
31;  this  allows  a  more  compact  representation  of  the  Winograd  convolution. 

i  =  C[(Ag)-(Bd)!  (2-10) 

=  C  ;  G  •  D  ; 

=  cs 


where 

ft  =  g(x)  in  (2-5) 

d  ==  d/x)  in  (2-5) 

A  =  M/S)  X  S  matrix  of  the  residues  of  g(x) 

B  =  M/S)  X  .V  matrix  of  the  residues  of  dfx) 

C  =  .V  X  M/S)  matrix  of  coefficients  for  the  residues  of  S 
S  =  M/S)  vector  of  the  residues  of  s(x) 

M/SI  =  the  number  of  multiplications  required  for  an  N-point  convolution 

The  form  of  (2-10)  may  be  massaged  to  yield  three  matrices  on  the  right  hand  side 
of  the  equation:  a  pre-addition  matrix,  a  post- addition  matrix,  and  a  diagonal  multipli- 


cation  matrix.  The  pre-addition  and  post-addition  matrices  are  composed  of  zeroes  and 
ones  (the  ones  may  be  positive  or  negative).  The  multiplication  matrix  is  a  diagonal 
matrix  of  coefficients.  These  coefficients  may  be  computed  and  stored  before  they  are 
needed.  The  advantages  of  the  matrix  form  of  the  short  Winograd  convolution  algo¬ 
rithm  will  become  clear  when  the  small  Winograd  DFT  module  is  discussed. 

One  of  the  reasons  for  computing  the  cyclic  convolution  directly  is  to  reduce  the 

number  of  operations,  especially  the  number  of  multiplications.  Using  the  direct  method 

of  computing  the  cyclic  convolution  (2-5),  one  needs  2 N~  real  multiplications  (assuming 

K- 1 

(<k  )  ,  2 

complex  input  data),  whereas  Winograd’s  algorithm  requires  2  ^  1  m  (* )  ;*  multi- 

k  =0 

plications.  For  example,  the  linear  convolution  of  a  real  3-point  vector  with  a  real  2- 
point  vector  requires  six  multiplications  using  the  direct  method  and  five  multiplications 
using  Winograd’s  method  [31.  Although  the  savings  of  the  number  of  multiplications  in 
the  example  is  not  great,  it  illustrates  the  idea.  Thus,  by  using  Winograd’s  small  convo¬ 
lution  algorithm,  the  number  of  multiplications  is  reduced  However,  we  still  must  show 
the  number  of  operations  required  to  compute  the  DFT  using  a  cyclic  convolution  is  less 
than  the  number  of  operations  required  to  compute  the  DFT  using  one  of  the  Fast 
Fourier  Transform  (FFT)  algorithms. 

2.3.2.  Rader’s  Prime  Algorithm.  Rader's  prime  algorithm  is  the  link  between  cyclic 
convolutions  and  DFTs  20  .  An  N-point  DFT  may  be  computed  using  a  cyclic  convolu¬ 
tion  of  length  (N-l)  and  index  scrambling  if  N  is  a  prime  number  First,  find  a  primitive 
clement  ~  of  a  finite  field  of  N  elements  (this  finite  field  is  referred  to  ;is  a  Galois  field 
and  is  denoted  by  GF(N)).  Each  integer  in  GF(\)  can  be  written  as  a  unique  power  of 
~  Now.  the  DFT 


k  =  1.  2,  .V-l 


(2-11) 
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can  be  rewritten  by  breaking  out  the  zero  frequency  and  zero  time  components: 
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This  ’breaking  out”  was  done  since  zero  cannot  be  expressed  as  a  power  of  the  primitive 
element  x.  Let  r(i),  defined  on  (1,  2.  ...  N-l},  be  a  unique  mapping  of  i,  also  defined  on 
{1.  2.  ...  N-l},  such  that: 


Mi') 
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Thus.  r(i)  is  simply  a  permutation  of  i.  Now,  (2-12b)  can  be  expressed  as: 
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or.  substituting  1  =  r( k )  and  j  =  N  -  1  -  r(i): 
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where  \'  '  and  v '  are  scrambled  output  and  input  sequences,  respectively. 

This  last  equation  (2-1-1)  can  be  recognized  as  a  cyclic  convolution  between  r'  j  and 
(reference  (2-51).  The  cyclic  convolution  can  be  evaluated  using  the  methods  described 
in  paragraph  2.3.1.  This  procedure.  Raders  prime  algorithm  coupled  with  Winograd's 


small  convolution  algorithm,  gives  rise  to  the  Winograd  Small  DFT  algorithm. 


2.4.  Winograd  DFT  Modules. 

Information  from  the  previous  section  showed  how  to  compute  DFTs  of  prime 
length  from  cyclic  convolutions.  The  reason  for  doing  so  was  to  reduce  the  number  of 
operations,  especially  the  number  of  multiplications,  involved  in  the  DFT  computation. 
This  section  deals  with  the  methods  for  computing  DFTs  of  lengths  15,  16.  and  17.  using 
the  general  approach  described  in  paragraph  2.2.  First,  Winograd’s  small  DFT  algo¬ 
rithm  will  be  described  for  DFTs  whose  blocklength  is  either  a  prime  or  a  power  of  a 
prime.  Then,  the  fitting  together  of  small  Winograd  modules  to  form  a  large  Winograd 
module  will  be  discussed.  Together,  both  types  of  Winograd  modules  will  be  used  in  the 
Good-Thomas  PFA  to  create  DFTs  with  still  longer  blocklengths  (reference  paragraph 

2.4. ). 

2.4.1.  Winograd’s  Small  DFT  Algorithm.  Constructing  a  DFT  using  Winograd’s 
small  DFT  algorithm  requires  knowledge  of  Rader's  prime  algorithm  and  Winograd’s 
short  convolution  algorithm.  This  small  DFT  algorithm  provides  an  efficient  mechanism 
for  calculating  DFTs  with  short  blocklengths;  in  this  case,  the  blocklengths  of  interest 
are  3.  5.  16.  and  17.  For  the  DFTs  whose  blocklength  is  a  prime  number  (e  g..  3,  V  and 
17).  Winograd's  small  DFT  algorithm  has  three  steps: 

1)  Change  the  DFT  to  a  cyclic  convolution  using  Rader  s  prime  algorithm. 

2)  Compute  the  cyclic  convolution  using  Winograd’s  short  convolution  algorithm 

3)  Incorporate  the  scrambling  required  by  Rader's  prime  algorithm  by  permuting 
the  rows  of  the  post-addition  matrix  and  the  columns  of  the  pre-addition  matrix. 

The  16-point  DFT  cannot  be  computed  using  this  method  since  16  is  a  power  of  J; 
Winograd's  small  DFT  algorithm  requires  three  steps  for  DFTs  whose  blocklength  is  a 
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power  of  2: 


1)  Compute  a  2lm  "-point  DFT  on  the  even  indices. 

(m  ~2)  (m  _2) 

2)  Compute  a  2  -point  DFT,  preceded  by  2  -  1  complex  multiplications,  on 

the  odd  indices. 

3)  Compute  two  polynomial  products  modulo  x*19  +  1  (an  irreducible  polynomial). 

Both  of  the  above  algorithms  are  presented  in  much  greater  detail  in  Blahut  31 
and  in  McClellan  and  Rader  [16].  The  equations  describing  the  Winograd  small  DFT 
algorithm  for  DFTs  of  lengths  3,  5,  16,  and  17  can  be  found  in  several  references 
3,  7,  16,  25.  29],  Table  2-1  shows  the  number  of  operations  required  for  each  DFT, 
using  the  small  Winograd  DFT  algorithm;  entries  are  given  for  Cooley-Tukey  radix-2 
algorithm  for  DFTs  of  lengths  4,  8,  and  16  for  comparison. 

The  entries  in  Table  2-1  are  given  for  complex  input  data.  The  entries  for  the 
Winograd  algorithm  include  trivial  multiplications  (by  ±  1  or  ±  j).  The  Winograd 
entries  in  Table  2-1  were  tabulated  in  Winograd  [29]  and  Blahut  [3],  The  Cooley- 
Tukey  entries  were  tabulated  in  Blahut  '3).  As  seen  in  the  direct  comparisons  for  the 
blocklengths  of  4,  8.  and  16.  the  small  Winograd  DFT  algorithm  requires  a  similar 
number  of  additions  as  the  Cooley-Tukey  radix-2  FFT,  but  substantially  fewer  multipli¬ 
cations.  It  is  this  savings  in  multiplications  which  brought  about  the  interest  in  using 


Comparison  of  Short  Blocklength  DFT  Algorithms 

DFT 

Size 

Small  Winograd 

Coolev-Tukev  Radix- 2 

multiplies  !  adds 

multiplies 

adds 

the  small  Winograd  DFT  algorithm.  These  savings  in  the  number  of  multiplications  will 
become  even  greater  when  the  large  Winograd  DFT  algorithm  is  used. 

2.4.2.  Winograd’s  Large  DFT  Algorithm.  Winograd’s  small  DFT  algorithm  yields 
computationally  efficient  DFT  modules;  however,  most  applications  require  blocklengths 
which  are  much  longer  than  those  given  by  Winograd’s  small  DFT  algorithm. 
Winograd’s  large  DFT  algorithm  does  this  by  combining  relatively  prime  small  Winograd 
modules  into  an  operation  which  yields  a  blocklength  equal  to  the  product  of  the  block- 
lengths  of  the  small  Winograd  modules.  There  are  five  steps  in  constructing  a  large 
Winograd  module  from  small  Winograd  modules: 

1)  Scramble  the  inputs  using  the  Chinese  Remainder  Theorem; 

2)  Nest  the  pre- additions; 

3)  Nest  the  multiplications; 

4)  Nest  the  post-additions: 

5)  Unscramble  the  outputs  using  the  Chinese  Remainder  Theorem. 

2. 4. 2.1.  Index  Mapping.  Understanding  the  Chinese  Remainder  Theorem  (C  R  T  )  is 
essential  to  building  a  large  Winograd  module.  The  C  R  T  requires  the  factors  which 
comprise  the  composite  blocklength  to  be  mutually  prime.  As  a  brief  example,  consider 
a  DFT  of  length  fifteen  (15)  Fifteen  an  he  written  us  a  product  of  two  mutually  prime 


where  and  n2  are  the  two  factors. 

The  input  mapping  is  described  by  the  equations: 


k  (  =  ( N 1  X  k  )  mod  n  , 

k2  —  (N1  '  X  k)  mod  n2 

(iV'  X  n !  X  k)'+  (N1  1  X  »2  X  fc)  =  1 

where 

i-,  indicates  the  row  of  the  input  array 
k„  indicates  the  column  of  the  input  array 
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These  two-dimensional  maps  are  changed  back  into  one  dimension  by  stacking  the 
columns  (reference  Figures  2-1  and  2-2). 


2.4.2.2.  Nesting  of  Additions  and  Multiplications.  The  nesting  of  the  multiplications 
and  additions  in  the  large  Winograd  DFT  algorithm  is  accomplished  using  the  associa¬ 
tive  property  of  Kronecker  products.  The  following  example  indicates  the  use  of 
Kronecker  products: 

N  =  JV,  x  N2 

W  =  matrix  representation  of  DFT  of  N 
=  CBAi  =  W  x 

Wj  =  matrix  representation  of  DFT  of  N  sub  1 
=  C[  B,  A, 

Wo  =  matrix  representation  of  DFT  of  N  sub  2 
=  CoBoAo 


where 

A.  A(,  A„  =  pre-addition  matrices 

B.  B,.  Bo  =  multiplicative  matrices 

C.  C,.  Co  =  post-addition  matrices 

X  =  input  vector  (N  elements) 

The  output  vector.  Y.  may  be  written  using  Kronecker  products: 

Y  =  (W.0W.)x 

-  (C,  B,  A,)0(C,  B,  A,)  x 

-  (C,®C..]  (B,0Bo|  (A.  0A..I  x 


One  of  the  advantages  of  using  the  large  Winograd  DFT  algorithm  is  that  all  like 
operations  are  combined  into  a  single  matrix  (i.e.,  the  pre-additions  are  combined  into 
one  matrix  of  pre-additions  and  the  same  for  the  multiplications  and  post-additions). 
Combining  the  multiplications  into  a  single  matrix  saves  a  considerable  number  of  opera¬ 
tions  i3,  14).  Since  the  multiplication  matrices  are  diagonal,  the  total  number  of  multi¬ 
plications  required  is  simply  the  product  of  the  number  of  multiplications  required  for 
each  small  Winograd  module  which  comprises  the  large  Winograd  module  (i.e.,  the 
number  of  multiplications  is  equal  to  the  number  of  diagonal  elements  in  B,  times  the 
number  of  diagonal  elements  in  B2).  Figure  2-3  shows  the  combination  of  the  pre¬ 
additions  and  the  post-additions  into  single  matrices,  while  Figure  2-4  shows  the  nesting 
of  the  multiplications  inside  the  additions. 
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Figure  2-3  Nesting  of  Additions  in  la-l’unt  Large  Winograd  Module 


3-pOINT  5-POINT  3 - P  01 N  T  5-POINT  5-POINT  3-POINT 

PRE-ADDS  PRE-ADDS  MULTIPLIES  MULTIPLIES  POST-ADDS  POST-ADDS 


s 


2.5.  Good-Thomas  Prime  Factor  Algorithm. 

The  Winograd  modules,  discussed  in  the  previous  section,  provide  an  efficient 
means  of  computing  short-  and  medium-length  DFTs.  One  of  the  drawbacks  of  using 
the  large  Winograd  modules  is  the  matrices  become  large  and  unwieldy,  imposing  serious 
memory  constraints  for  DFT  computation  (e.g.,  a  255-point  transform  computed  using 
the  large  Winograd  algorithm  requires  a  multiplication  matrix  of  size  1280).  The  Good- 
Thomas  prime  factor  algorithm  (PFA)  requires  more  overall  multiplications  than  the 
large  Winograd  algorithm.  But,  since  the  DFT  is  broken  into  several  component  parts, 
the  total  number  of  multiplications  is  resolved  into  several  manageable  subproblems 
(e.g.,  a  255-point  DFT  computed  using  the  Good-Thomas  PFA  requires  multiplication 
matrices  of  size  18  and  36  for  the  15-point  module  and  the  17-point  module,  respectively; 
reference  Table  2-2  for  total  number  of  operations).  Also,  the  structure  of  the  Good- 
Thomas  PFA  lends  itself  to  pipelined  architectures  in  circuit  design  (reference  Chapter 
31,  which  means  the  pre-addition  matrix  may  be  working  on  one  problem  while  the  mul¬ 
tiplication  and  post-addition  matrices  are  working  on  the  previous  problem  (this  gives 
better  utilization  of  arithmetic  resources).  Putting  together  a  PFA  algorithm  involves: 

1)  Creating  input  and  output  maps  of  the  indices; 

2)  Choosing  efficient  DFT  modules  whose  blocklengths  are  mutually  prime  (i.e..  no 
common  factors). 

The  index  mapping  is  identical  to  that  used  in  the  large  Winograd  algorithm  since 
the  mapping  routine  for  the  PFA  also  uses  the  Chinese  Remainder  Theorem  (C.R.T.). 
The  index  mapping  is  performed  prior  to  the  iirst  module  and  alter  the  last  module.  For 
example,  a  1080-point  transform  has  factors  15.  16.  and  17  One  possible  sequence  of 
operations  is  described  below: 

1)  Perform  the  input  mapping; 

2)  Compute  255  16-pomt  DFTs: 


3)  Compute  272  15-point  DFTs; 

4)  Compute  240  17-point  DFTs; 

5)  Perform  the  output  mapping. 

Steps  2,  3.  and  4  can  rearranged  to  fit  any  order. 

The  short  DFT  modules  may  be  designed  using  any  algorithm  (e.g.,  Cooley-Tukey 
radix-2.  Winograd  small,  Winograd  large,  etc.);  the  only  constraint  is  the  bloeklengths  of 
the  modules  must  have  no  common  factors.  Winograd  modules  are  well-suited  for  use 
within  the  Good-Thomas  PFA  since  small  Winograd  modules  have  bloeklengths  which 
are  either  prime  or  a  power  of  a  prime,  while  large  Winograd  modules  also  require  mutu¬ 
ally  prime  factors.  The  idea  behind  the  Good-Thomas  PFA  is  to  change  a  one¬ 
dimensional  DFT  into  an  p-dimensional  DFT,  where  p  is  the  number  of  mutually  prime 
factors  (Nt,  N„,  ...  N  )  which  compose  the  overall  blocklength.  N.  In  the  two-factor  case 
(i.e.,  N  =  Nt  X  N„),  Nj  N„-point  DFTs  are  computed,  then  N„  Nj-point  DFTs.  The 
order  of  computing  the  DFTs  does  not  alter  the  total  number  of  operations  (additions 
and  multiplications).  The  number  of  multiplications  will  be  greater  than  or  equal  to 
more  than  the  number  of  multiplications  required  by  a  large  Winograd  module  of  equal 
length.  However,  the  number  of  multiplications  for  each  block  is  fewer  than  the  number 
required  for  a  large  Winograd  DFT  (if  small  Winograd  DFT  modules  are  used  in  the 
PFA)  Table  2-2  shows  a  comparison  between  the  Good-Thomas  PFA  lusing  small 
Winograd  modules)  and  the  large  Winograd  DFT  algorithm;  the  entries  for  Table  2-2  are 
for  complex  input  data. 

The  number  of  multiplications  does  not  include  trivial  multiplications  (by  ±  1  or  ± 
j)  Entries  for  Table  2-2  were  computed  u^ing  equations  from  Kolba  and  Parks  14 
While  the  number  of  multiplications  needed  for  the  PFA  is  approximately  twice  that 
required  for  the  large  Winograd.  the  number  of  additions  required  lor  the  ( lood-Thomas 
PFA  is  less.  Other  algorithms,  such  as  the  (’ooley-Tukey  radix-2.  require  many  more 
multiplications  than  the  PFA  (eg.,  a  25t>-pomt  DFT  for  complex  inputs  using  the 


Comparison  of  Long  Blocklength  DFT  Algorithms 


DFT 

Good-Thomas  PFA 

Large  Winograd 

Size 

multiplies 

adds 

multiplies 

adds 

15 

50 

81 

34 

81 

240 

1100 

4812 

632 

5136 

255 

1900 

7464 

1280 

8406 

272 

1640 

7540 

1280 

8168 

4080 

31148 

157164 

23312 

189048 

Table  2-2 


Coolev-Tukey  radix- ‘2  requires  2048  real  multiplications).  Figure  2-5  shows  a  15-point 
DFT  computed  using  the  PFA;  five  3-point  DFTs  are  calculated,  then  three  5-point 
DFTs. 


2.6.  Summary 

The  material  in  this  chapter  has  shown  how  to  reduce  the  number  of  operations 
necessary  to  compute  a  DFT.  First,  the  class  of  FFT  algorithms  was  examined.  The 
FFT  algorithms  reduced  the  number  of  operations  by  exploiting  the  conjugate  symmetry 
and  periodicity  properties  of  the  DFT  computation.  However,  the  I  O  bandwidth  of  the 
FFT  was  poor  due  to  frequent  memory  references  for  both  data  and  coefficients.  Thus,  a 
different  class  of  algorithms,  developed  by  Winograd.  was  analyzed.  The  Winograd  algo¬ 
rithms.  which  have  been  shown  to  use  the  fewest  number  of  multiplications  for  DFT 
computation  29>,  were  developed  using  cyclic  convolutions  to  compute  DFTs  with  short 
biocklengths.  For  longer  blocklengths.  or  for  those  blocklengths  which  are  not  prime, 
the  large  Winograd  and  Good-Thomas  PFA  were  shown  to  be  efficient  algorithms  for 
DFT  computation.  The  material  in  the  next  section  shows  how  to  implement  these  algo¬ 
rithms  in  a  VLSI  architecture. 
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Chapter  3 

VLSI  Implementation 


3.1.  Overview 

The  material  from  the  previous  chapter  showed  the  algorithms  used  to  compute  the 
DFT.  Now.  we  need  an  architecture  which  can  implement  those  algorithms  in  a  very 
large-scale  integrated  (VLSI)  circuit.  VLSI  circuits  lend  themselves  to  regular,  parallel 
structures  [151.  The  matrix  form  of  the  Winograd  algorithms  map  easily  into  VLSI  cir¬ 
cuits.  with  regular  structures  of  adder/subtractors  and  multipliers  (although  determining 
the  routing  between  the  elements  is  not  a  trivial  task).  Also,  the  structure  of  the  Wino¬ 
grad  algorithm  lends  itself  to  pipelining,  the  ability  to  work  on  more  than  one  problem 
at  a  given  time.  The  information  in  this  chapter  is  provided  for  both  the  16-point  Wino¬ 
grad  processor  and  the  4080-point  PFA  processor.  First,  the  overall  layout,  signal  flow, 
and  physical  characteristics  for  both  circuits  are  given.  Then,  fault  tolerance  is 
presented  from  a  system-design  viewpoint.  Finally,  the  computational  throughput  of  the 
PFA  system  is  described. 


3.2.  Winograd  Processors. 

The  signal  flow,  circuit  layouts,  and  physical  characteristics  will  he  discussed  for 
the  16-point  Winograd  processor  ( the  16-point  processor  being  chosen  a.s  being  represen¬ 
tative  of  the  Winograd  processors!.  The  differences  for  the  15-point  and  17-potnt  chips 
relate  ro  the  internal  data  representation  (30  bits  for  the  15-point  and  34  bits  for  the 
17-point  versus  32  bits  for  the  1 6-point (  and  the  number  of  sign  extensions  required  to 
prevent  arithmetic  overflow  (4  sign  extensions  in  the  15-point  and  5  in  the  17-point 
versus  3  sign  extensions  in  the  16-point).  The  internal  data  representation  will  change 
the  Mine  to  compute  a  DFT  (reference  paragraph  3.5  )  The  need  t'<  .r  sign  extensions  'o 


prevent  arithmetic  overflow  is  discussed  in  more  detail  in  paragraph  4.2.2. 

The  data  flow  for  the  16-point  chip  is  shown  in  Figure  3-1. 

As  seen  in  Figure  3-1,  the  circuits  of  the  16-point  processor  may  be  grouped  into  three 
categories: 

1)  Input/Output  (I/O); 


Figured-!  1 6- Point  Winograd  Processor 


3)  Arithmetic. 


The  Parallel  In/ Serial  Out  (PISO)  registers  take  inputs  from  the  data  memory,  while  the 
Serial  In/ Parallel  Out  (SIPO)  registers  send  results  to  the  data  memory;  there  are  two 
sets  of  each  type  of  register,  one  set  for  real  data/results  and  one  set  for  imaginary 
data/  results.  Also,  there  is  a  4-word  buffer  which  holds  the  12-bit  addresses  for  the  data 
memory.  There  are  two  types  of  control  circuits;  one  type  is  for  on-chip  control  of  tim¬ 
ing  and  address  generation,  while  the  other  is  for  off-chip  communication  (scaling  con¬ 
trol.  handshake  signalling,  etc.).  The  arithmetic  circuits  reflect  the  three  components  of 
the  Winograd  modules:  a  pre-addition  matrix,  a  post-addition  matrix,  and  a  multiplica¬ 
tion  matrix.  Adder/ subtractors  and  multipliers  compose  most  of  the  arithmetic  circui¬ 
try;  there  are  also  several  reset  circuits  for  clearing  intermediate  results  before  a  new  16- 
point  DFT  is  started.  The  signal  flow  through  the  16-point  processor  is  described  below: 

1)  The  host  processor  sends  an  OPERATE  signal  to  the  16-point  processor,  so  the 
processor  may  begin  computing  DFTs; 

2)  Sixteen  24-bit  data  words  are  loaded  from  the  input  memory  to  the  real  and 
imaginary  PISO  registers; 

3)  After  all  sixteen  data  words  are  loaded,  the  PISO  registers  latch  the  words  into 
the  serial  output  portion  of  each  register; 

dal  The  input  data  words  then  are  serially  shifted  into  the  arithmetic  circuitry, 
after  passing  through  a  parity  check  cell  which  Hags  any  parity  errors  on  the  input  data: 

lb)  While  the  first  set  of  data  words  are  being  shifted  out  serially,  the  next  set  of 
data  words  are  being  loaded,  in  parallel,  into  the  PISO  register  from  the  data  memory 
!  reference  paragraph  3.2.1). 

~>i  The  data  pass  through  the  matrix  of  pre-adders,  multipliers,  and  post-adders 
(this  is  where  file  16-point  DFT  is  computed): 
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6)  Before  being  sent  to  their  respective  SIPO  registers,  the  real  and  imaginary 
results  are  sent  through  the  parity  /  rounding  cell  (which  rounds  the  3‘2-bit  arithmetic 
results  to  23-bit  output  results  and  computes  a  parity  bit  to  be  appended  to  the  23-bit 
result); 

7)  After  all  twenty-three  data  bits  and  the  single  parity  bit  have  been  shifted  into 
the  SIPO  registers,  the  result  words  are  latched  into  the  parallel  portion  of  each  register; 

8)  The  24-bit  results  are  shifted  out  of  the  SIPO  register,  through  the  scaling  cell 
(which  checks  the  most  significant  seven  bits  of  each  output  data  word  to  find  the  smal¬ 
lest  number  of  sign  extensions  for  all  4080  words),  and  out  to  the  output  memory. 

9)  This  16-point  processor  continues  to  compute  DFTs  until  it  has  exhausted  all 
the  addresses  for  that  particular  DFT  blocklength  (i.e. .  255  16-point  DFTs  will  be  com¬ 
puted  in  the  4080-point  PFA  system). 

Each  type  of  circuitry  for  the  Winograd  processors  is  discussed  in  greater  detail  in  the 
following  paragraphs. 

3.2.1.  Input/Output  Circuitry.  There  are  two  types  of  I  O  circuitry,  one  for  data 
and  one  for  addresses.  The  data  I  O  circuitry  are  the  PISO  and  SIPO  registers.  Each 
type  of  data  register  has  master-slave  two  11  ip- Hops  in  each  cell.  One  tlip-llop  is  used  to 
store  the  input  data,  while  the  other  is  used  to  store  the  output  data.  For  example,  the 
input,  flip- flops  in  each  cell  of  rfie  PESO  array  hold  the  input  data  as  each  21-bit  word  is 
'tufted  into  the  PISO  register.  When  the  latch  signal  is  activated,  the  bits  in  the  input 
flip- Hops  are  copied  into  the  output  tlip-llops  Then,  the  outputs  may  be  shifted  out 
(serial  shift  out)  while  a  new  set  of  inputs  is  being  shifted  in  (parallel  shift  ml.  Figures 
•»-2a.  3-2b.  and  3- 2c  show  the  sequence  of  shifting  in.  latching,  and  shifting  out  |Or  the 
I’lso  register  The  operation  of  the  SIPO  register  is  similar,  except  the  input  shifting  is 
u  ne  serially  while  the  output  shifting  is  d"iie  m  in  parallel  fashion 


Figure  3- 2c  PISO  Cell-Parallel  Shift  In,  Serial  Shift  Out 


The  address  buffer  holds  four  12-bit  addresses;  the  addresses  are  stored  sequentially 
in  the  buffer  (i.e.,  first,  second,  third,  and  fourth).  Since  the  longest  blocklength  is  4080. 
twelve  bits  suffice  for  the  addressing  ($2  sup  12*  =~  4096$).  The  data  address  is  loaded 
onto  the  address  bus  during  one  clock  cycle;  the  data  words  from  the  memory  are  loaded 
onto  the  data  bus  on  the  next  clock  cycle  (reference  paragraph  3.2.3  for  memory  require¬ 
ments).  A  more  complete  description  of  the  address  generation  process  is  given  in  para¬ 
graph  3.2.2. 


3.2.2.  Control  Circuitry.  There  are  two  type  of  control  circuitry:  one  is  for  olT-ohip 
communication,  while  the  other  is  for  on-chip  communication.  Olf-chip  communication 
consists  of  three  handshake  signals  and  a  three-bit  scaling  factor.  The  three  handshake 
signals  are  OPERATE.  DONEC'OMP.  and  DONEIN.  The  OPERATE  signal  is  ori¬ 
ginated  by  the  interface  chip  and  is  sent  to  the  Winograd  processor,  it  starts  the  proces¬ 
sor  computing  DFTs.  The  DONEIN  signal  is  originated  by  a  Winograd  processor  and  is 
sent  to  the  interface  processor:  it  indicates  the  Winograd  processor  has  finished  its  pass 


through  the  input  data.  The  DONECOMP  signal  is  originated  by  the  Winograd  proces¬ 
sor  and  is  sent  to  the  interface  processor;  it  indicates  the  Winograd  processor  has 
finished  its  DFT  computations.  Each  Winograd  processor  computes  an  output  scaling 
factor  which  indicates  the  largest  magnitude  of  any  data  word  in  a  particular  DFT  com¬ 
putation.  Refer  to  paragraph  3.3.  for  more  information  on  the  handshake  signals. 

The  scaling  factor  is  a  three-bit  number  passed  from  one  processor  to  the  interface 
processor.  The  scaling  factor  is  a  direct  indication  of  the  smallest  number  of  sign  exten¬ 
sions  for  the  given  set  of  4080  data  words.  The  scaling  factor  is  computed  as  the  data 
words  are  being  transmitted  from  the  SIPO  register  to  the  data  memory;  the  scaling  fac¬ 
tor  is  latched  into  a  special  register  when  a  particular  Winograd  processor  has  finished 
its  pass  through  the  data.  When  the  interface  processor  receives  the  DONECOMP  sig¬ 
nal  from  a  Winograd  processor,  it  reads  the  scale  factor  from  the  register  and  sends  it  to 
the  next  processor  in  the  system  (reference  paragraph  3.3.).  The  host  may  or  may  not 
provide  a  scaling  factor  (depending  on  system  configuration);  if  the  host  does  not  provide 
the  scaling  factor,  the  first  Winograd  processor  assumes  a  scale  factor  of  zero. 

There  are  two  types  of  on-chip  control  ctrcuitrv;  one  is  used  to  implement  the  pro¬ 
cessor  timing  diagram,  while  the  other  is  used  for  address  generation.  The  processor 
timing  diagram  reflects  the  temporal  relationship  of  internal  control  signals  necessary  for 
the  arithmetic  circuitry  to  correctly  perform  a  16-point  DFT  calculation  (e.g.. shifting, 
latching,  etc.)  and  to  enable  special  circuits  (parity,  rounding,  etc  ). 

The  parity  check  cell  between  the  PISO  register  and  the  arithmetic  circuitry  is 
enabled  for  twenty-three  clock  cycles.  This  allows  the  parity  check  cell  to  compute  the 
parity  on  the  input  data  word  and  compare  its  result  to  the  input  parity  hit  There  is 
an  arithmetic  round  signal  which  enables  the  multipliers  to  round  the  60-bit  results  (3 2 
—  28  =  60)  to  thirty-two  bits  6:  The  round  circuitry  in  the  P  R  cell  between  the 
arithmetic  circuitry  and  the  SIPO  register  is  active  for  twenty-four  '  lock  cycles.  This 
allows  the  round  circuitry  to  operate  on  the  most  significant  twenty-four  bits  of  data. 


rounding  the  result  to  twenty-three  data  bits  for  the  SIPO  register.  The  parity  circuitry 
in  the  P  R  cell  monitors  the  output  of  the  round  circuitry  in  the  P/R  cell;  after  the 
twenty-three  data  bits  have  been  sent  to  the  SIPO  register  from  the  round  circuitry,  the 
parity  circuit  appends  a  parity  bit  (computed  from  the  23  data  bits). 

The  shifting  of  output  results  from  the  SIPO  register  is  straightforward  and  occurs 
at  the  same  relative  time  each  32-cycle  period;  however,  the  shifting  of  input  data  words 
depends  on  the  input  scaling  factor  (from  the  previous  processor).  If  scaling  was  not 
implemented,  five  zeroes  would  be  inserted  at  the  least  significant  end  of  the  incoming 
data  word  and  four  sign  extensions  would  be  appended  at  the  most  significant  end.  The 
four  sign  extensions  allow  for  arithmetic  growth  in  the  DFT  computation  and  for  the 
extra  sign  extension  required  by  the  multipliers  [6j.  Only  three  sign  extensions  are 
required  for  arithmetic  growth  since  the  two  results  from  the  pre-addition  matrix  which 
have  more  than  eight  terms  are  multiplied  by  one  (i.e.,  trivial  multiplications);  thus,  only 
three,  rather  than  four,  sign  extensions  are  required  to  allow  for  arithmetic  growth.  The 
reasons  for  using  sign  extensions  are  explained  in  more  detail  in  paragraph  4.2.2. 

The  zeroes  change  the  arithmetic  inputs  to  thirty-two  bits  (to  balance  the  32-cvcle 
period  for  reading  in  new  data).  If  the  input  scaling  factor  is  greater  than  or  equal  to 
four,  no  additional  sign  extensions  are  required  (since  the  incoming  data  words  have  at 
least  four  sign  extensions),  thus,  nine  zeroes  may  be  inserted  at  the  least  significant  end 
of  each  data  word  (9  —  23  =  32).  If  the  input  scaling  factor  is  less  than  four,  additional 
sign  extensions  are  needed:  the  number  of  sign  extensions  required  is:  f  -  inscale  and  the 
number  of  zeroes  inserted  is:  .5  —  inscale  (where  inscaie  is  the  three-bit  input  scaling 
factor).  The  effect  of  inserting  the  zeroes  at  the  least  significant  end  is  to  delay  shifting 
the  23-bit  incoming  data  word  out  of  the  P1SO  register  into  the  arithmetic  circuitry. 

The  address  generation  circuitry  computes  the  input  and  output  addresses.  The 
addresses  are  stored  in  a  read-only  memory  (ROM).  The  order  of  the  addresses  is 
governed  by  Chinese  Remainder  Theorem  (reference  paragraph  2.3.2. 1  )  The  address 


buffer  holds  four  addresses  since  the  access  time  of  the  ROM  is  too  long  to  retrieve  a  sin¬ 


gle  address  in  two  clock  cycles.  A  pointer  to  the  first  of  a  group  of  four  addresses  is 
loaded  into  the  ROM  address  bus;  the  four  addresses  are  loaded  from  the  ROM  into  the 
address  buffer.  The  pointer  is  incremented  to  the  next  set  of  four  addresses  and  the 
address  generation  process  is  repeated.  Since  different  configurations  may  be  used  for 
the  overall  DFT  processor  (i.e.,  compute  240-.  255-.  or  272-point  DFTs  rather  than 
4080-point  DFTs),  any  of  the  Winograd  processors  may  be  required  to  implement  the 
PFA  input  mapping;  thus,  each  Winograd  processor  uses  this  method  of  address  genera¬ 
tion.  The  output  addresses  are  identical  to  the  input  addresses  (i.e..  the  result  is  stored 
at  the  same  relative  location  in  the  output  memory  as  the  corresponding  input  was 
taken  from  the  input  memory).  Thus,  the  output  addresses  are  merely  delayed  versions 
of  the  input  addresses.  Rossbach  .22!  has  more  details  on  the  control  circuitry. 


3.2.3.  Arithmetic  Circuitry.  The  adder,  subtractors  in  the  arithmetic  circuitry  emu¬ 
late  the  pre-addition  and  post-addition  matrices  of  the  Winograd  modules.  Since  the 
addition  matrices  contain  both  positive  and  negative  entries,  the  elements  which  reflect 
the  operation  of  the  addition  matrices  must  support  both  addition  and  subtraction. 
Multiplication  by  imaginary  coefficients  is  reflected  in  the  post- addition  matrix  (reference 
paragraph  2.3.1.).  So.  the  results  from  the  real  and  imaginary  multiplication  circuits 
must  be  combined.  This  is  indicated  in  Figure  3-1  bv  the  dotted  line  separating  the  real 
and  imaginary  post-addition  circuits.  The  multipliers  are  bit-serial  and  employ  a 
modified  Booth's  algorithm  6L  Each  multiplier  cell  represents  two  bits  of  the 
coefficient.  Thus,  fourteen  multiplier  cells  must  be  used  for  28-bit  c<  efficients.  Since  the 
coefficients  are  known  ahead  of  time,  they  may  be  hardwired  into  the  multiplier,  rather 
than  being  stored  in  a  coefficient  memory.  This  reduces  the  time  and  area  required  to 
perform  a  multiplication.  The  multipliers  round  the  results  from  sixty  bits  (28  —  32  - 
liOl  to  thirty-two  (32)  bits.  The  use  of  rounding  rather  than  truncation  provides  a  better 
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signal-to-noise  margin  (reference  paragraph  4.2.2.).  Coutee  6]  has  more  details  on  the 
operation  of  the  arithmetic  circuitry. 

3.2.4.  Physical  Characteristics  of  18-Point  Processor.  There  will  be  three  phases  of 
fabrication  and  packaging  for  the  Winograd  processors.  In  the  first  phase,  test  macro¬ 
cells  for  parts  of  the  arithmetic  and  control  circuitry  were  designed  using  scalable  3-^m 
design  rules  for  complementary  metal-oxide  semiconductor  (CMOS)  technology.  These 
test  cells  were  packaged  in  32-pin  dual  in-line  packages  (DIPs).  Test  cells  fabricated  and 
tested  to  date  have  worked  at  a  clock  rate  of  50  MHz  6.  22). 

After  the  design  of  the  VLSI  circuits  is  verified,  the  Winograd  processor  will  be 
fabricated  using  a  1.25-^m  CMOS  process.  These  single-chip  processor  should  be  pack¬ 
aged  using  pin-grid  arrays  or  chip  carriers  :8]  to  support  the  144-pin  requirements  (96 
data.  24  address,  12  control,  and  12  power).  Each  processor  should  occupy  one  square 
inch  of  surface  area  on  a  circuit  board  (assuming  a  144-pin  package). 

The  final  phase  of  fabrication  will  bring  the  processors  and  required  peripheral  dev¬ 
ices  (interface  processor  and  memories;  reference  paragraph  3.3.)  into  a  hybrid  circuit 
package.  The  reasons  for  using  the  hybrid  circuit  are  reduced  surface  area,  increased 
reliability  (elimination  of  error-prone  wire  bonds),  and  increased  I  O  bandwidth  (shorter 
paths  between  processors  and  memories  means  faster  transitions  are  possible!.  The  tar¬ 
get  dork  rate  for  the  1.25- /am  chips  is  70  MHz. 

Table  3-1  contains  a  summary  of  Winograd  processor  characteristics. 

3.3.  4080-Point  PFA  Processor.  The  Winograd  processors  described  in  rhe  previous  sec¬ 
tion  provide  a  means  of  computing  DFTs.  But.  there  must  lie  other  devices  which  enable  the 
data  to  be  sent  to  the  Winograd  processors  and  which  store  the  results.  Also,  the  individual 
Winograd  processors  compute  short  blocklength  DFT-c  useful  applications  require  blocklengi h-  of 
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Winograd  Processor  Characteristics 


Characteristic 


Technology 

Packaging 

Size 

Pin  Count 
Clock  Rate 
Power 


(a)  estimated 

(b)  active  mode;  estimated 

(c)  standby  mode;  estimated 


Value 


CMOS 

Pin-Grid  Array 
Chip  Carrier 
1  sq.  in.  (a) 

144 

70  MHz 
1  W  (a) 

100  mVV  (c) 


Table  3-1 

256  (image  processing)  and  4096  (synthetic  aperture  radar  and  matched  filtering).  Thus,  a  system 
which  computes  these  longer  blocklength  DFTs  needs  to  be  addressed.  The  4080-point  PFA  sys¬ 
tem  was  chosen  as  a  representative  system.  Systems  for  computing  DFTs  with  blocklengths  of 
240.  255.  and  272  points  will  use  one  less  Winograd  processor,  memory  controller,  and  data 
memory.  The  layout  of  the  4080-point  PFA  processor  is  presented  in  Figure  3-3. 
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There  are  separate  chips  for  each  Winograd  processor,  dual-port  memories,  and  the 
interface  processor.  The  Winograd  processors  operate  autonomously,  computing  DFTs 
using  either  the  small  Winograd  algorithm  (16-  and  17-point)  or  the  large  Winograd 
algorithm  (15-point).  The  dual-port  memories  allow  one  Winograd  processor  (or  the  host 
processor)  to  till  one  half  of  the  memory  while  its  neighboring  processor  reads  from  the 
other  half  of  the  memory.  The  interface  processor  provides  overall  control  of  the  pipe¬ 
line:  asynchronous  control  signals  from  the  memory  controllers,  the  W’inograd  processors, 
and  the  host  How  through  the  interface  processor. 

The  signal  How  through  the  processor  for  one  DFT  is  described  below: 

1)  The  host  processor  tills  left  half  of  input  memory  with  data; 

2)  When  the  host  finishes  tilling  left  half  of  input  memory,  it  sends  a  hatuKhake 
signal  to  the  lirst  memory  controller,  indicating  the  left  half  of  the  input  memory  is 
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Figure  3-3  4080-Point  PFfl  Processor 


3a)  The  first  memory  controller  switches  the  control  logic  for  the  input  memory 
such  that  the  host  is  writing  to  the  right  half  of  the  input  memory  and  the  16-point  pro¬ 
cessor  is  reading  from  the  left  half  of  the  input  memory; 

3b)  After  the  memory  controller  has  switched  the  logic  for  the  input  memory,  the 
host  sends  an  OPERATE  signal  to  the  interface  processor,  which  relays  the  OPERATE 
signal  to  the  16-point  processor  (this  initiates  the  16-point  pass  through  the  data); 

4)  As  the  16-point  chip  is  computing  255  (15  X  17)  DFTs.  it  sends  the  outputs  to 
the  left  half  of  memory  1; 

5)  When  the  host  has  finished  filling  the  right  half  of  the  input  memory  with  new 
input  data,  it  sends  a  handshake  to  the  first  memory  controller,  indicating  the  right  half 
of  the  input  memory  is  ready; 

6)  When  the  16-point  processor  has  finished  reading  in  4080  data  words,  it  sends  a 
DONEIN  signal  to  the  interface  processor,  which  relays  the  DONEIN  signal  to  the  first 
memory  controller  (this  allows  the  first  memory  controller  to  switch  the  control  logic  of 
the  input  memory  such  that  the  host  is  writing  to  the  left  half  of  the  input  memory  and 
the  16-point  processor  is  reading  from  the  right  half  of  the  input  memory); 

7)  When  the  16-point  processor  has  finished  computing  255  16-point.  DFTs.  it 
sends  a  DONECOMP  signal  to  the  interface  processor,  which  relays  the  DONECOMP 
signal  to  the  second  memory  controller  (this  allows  the  second  memory  controller  to 
switch  the  control  logic  of  memory  1  such  that  the  16-point  processor  is  writing  to  the 
right  half  of  memory  1  and  the  15-point  processor  is  reading  from  the  left  half  of 
memory  1|  and  to  the  host  (indicating  the  16-point  processor  has  finished  its  pass 
t  h rough  t he  data); 

Nil  When  the  host  has  received  the  DONECOMP  signal  from  the  16-p<>int  pp.ces- 
'■>r  (via  the  interface  processor)  and  the  host  has  finished  tilling  the  right  half  .if  the 
input  memory,  the  host  sends  an  (OPERATE  signal  to  the  interface  processor  which 


relays  the  OPERATE  signal  to  the  16-point  processor  (this  allows  the  16- point  processor 
to  begin  its  pass  through  the  next  set  of  4080  data  words); 

8b)  When  the  host  has  received  the  DONECOMP  signal  from  the  16-point  proces¬ 
sor  (via  the  interface  processor),  the  host  sends  an  OPERATE  signal  to  the  interface 
processor,  which  relays  the  OPERATE  signal  to  the  15-point  processor  (this  allows  the 
15-point  processor  to  begin  its  pass  through  the  first  set  of  4080  data  words); 

9)  As  the  15-point  processor  is  computing  272  (16  X  17)  DFTs.  it  sends  the  out¬ 
puts  to  the  left  half  of  memory  2; 

10)  When  the  16-point  processor  has  finished  255  16-point  DFTs,  it  sends  a 
DONECOMP  signal  to  the  interface  processor  (this  allows  the  interface  processor  to  send 
a  handshake  to  the  second  memory  controller,  indicating  the  right  half  of  memory  1  is 
ready); 

11)  When  the  15-point  processor  has  finished  reading  in  4080  data  words,  it  sends 
a  DONEIN  signal  to  the  interface  processor,  which  relays  the  DONEIN  signal  to  the 
second  memory  controller  (this  allows  the  second  memory  controller  to  switch  the  con¬ 
trol  logic  of  memory  1  such  that  the  16-point  processor  is  writing  to  the  left  half  of  the 
input  memory  and  the  15-point  processor  is  reading  from  the  right  half  of  memory  1); 

12)  When  the  15-point  processor  has  finished  computing  272  15-point  DFTs.  it 
sends  a  DONECOMP  signal  to  the  interface  processor,  which  relays  the  DONECOMP 
signal  to  the  third  memory  controller  (this  allows  the  third  memory  controller  to  switch 
the  control  logic  of  memory  2  such  that  the  15-point  processor  is  writing  to  the  right 
half  of  memory  2  and  the  17-point  processor  is  reading  from  the  left  half  of  memory  2) 
and  to  the  host  (indicating  the  15-point  processor  has  finished  its  pass  through  the  datal; 

1  -ia I  When  the  host  has  received  the  DONECOMP  signal  from  the  15-point  proces¬ 
sor  (via  the  interface  processor)  and  the  16-point  processor  has  finished  filling  the  right 
half  of  memory  1.  the  host  sends  an  OPERATE  signal  to  the  interface  processor,  which 
relays  the  OPERATE  signal  to  the  15-point  processor  (this  allows  the  1.5- point,  processor 


to  begin  its  pass  through  the  next  set  of  4080  data  words); 


13b)  When  the  host  has  received  the  DONECOMP  signal  from  the  15-point  pro¬ 
cessor  (via  the  interface  processor),  the  host  sends  an  OPERATE  signal  to  the  interface 
processor,  which  relays  the  OPERATE  signal  to  the  17-point  processor  (this  allows  the 
17-point  processor  to  begin  its  pass  through  the  first  set  of  4080  data  words); 

14)  As  the  17-point  processor  is  computing  240  (15  X  16)  DFTs,  it  sends  the  out¬ 
puts  to  the  left  half  of  the  output  memory; 

15)  When  the  15-point  processor  has  finished  272  15-point  DFTs,  it  sends  a 
DONECOMP  signal  to  the  interface  processor  (this  allows  the  interface  processor  to  send 
a  handshake  to  the  second  memory  controller,  indicating  the  right  half  of  memory  2  is 
ready); 

16)  When  the  17-point  processor  has  finished  reading  in  4080  data  words,  it  sends 
a  DONEIN  signal  to  the  interface  processor,  which  relays  the  DONEIN  signal  to  the 
third  memory  controller  (this  allows  the  third  memory  controller  to  switch  the  control 
logic  of  memory  2  such  that  the  15-point  processor  is  writing  to  the  left  half  of  memory 
2  and  the  17-point  processor  is  reading  from  the  right  half  of  memory  2); 

17)  When  the  17-point  processor  has  finished  computing  240  17-point  DFTs.  it 
sends  a  DONECOMP  signal  to  the  interface  processor,  which  relays  the  DONECOMP 
signal  to  the  fourth  memory  controller  (this  allows  the  fourth  memory  controller  to 
switch  the  control  logic  of  the  output  memory  such  that  the  17-point  processor  is  writing 
to  the  right  half  of  the  output  memory  and  the  host  is  reading  from  the  left  half  of  the 
output  memory)  and  to  the  host  (indicating  the  17-point  processor  has  finished  its  pass 
through  the  data); 

lNil  When  the  host  has  received  the  DONECOMP  signal  from  the  17-pomt  proces¬ 
sor  (via  the  interface  processor!  and  the  15-point  processor  has  finished  filling  the  right 
half  of  memory  2.  the  host  sends  an  OPERATE  signal  to  the  interface  processor,  which 
relays  the  OPERATE  signal  to  the  17-point  processor  (this  allows  the  17-pomt  processor 


to  begin  its  pass  through  the  next  set  of  4080  data  words); 

18b)  When  the  host  has  received  the  DONECOMP  signal  from  the  17-point  pro¬ 
cessor  (via  the  interface  processor),  it  begins  to  read  the  results  from  the  output 
memory. 

The  salient  features  of  the  PFA  system  are  the  asynchronous  control  signals  for 
host- processor  and  processor- processor  communications  and  the  data  flow  between  the 
processors.  The  interface  between  the  host  and  the  Winograd  processors  is  simple, 
requiring  only  three  handshake  signals.  The  Winograd  processor  operate  autonomously, 
communicating  with  the  host  (via  the  interface  processor)  only  at  the  beginning  and  end 
of  each  DFT.  The  DFT  problems  are  pipelined  both  within  the  Winograd  processors 
and  in  the  PFA  system,  providing  high  throughput  (reference  paragraph  3.5.).  Another 
virtue  of  the  system  as  shown  in  Figure  3-3  is  it  may  be  reconfigured  easily  if  necessary 
(reference  paragraph  3.5.). 

3.4.  Data  Memory  Requirements. 

The  data  memory  is  implemented  off-chip  from  the  Winograd  processors;  there  was 
insufficient  area  on-chip  for  the  WFTA  circuitry  and  memory  circuitry  6.  '22!.  The 
memory  must  be  organized  into  banks  of  4080  24-bit  words.  The  most  significant  bit  will 
be  a  parity  bit  (odd  parity),  while  the  remaining  twenty-three  (23)  bits  comprise  a 
t wos-complement  representation  of  the  data.  The  parity  bit  is  used  to  provide  an 
independent  check  on  the  memory  circuits:  odd  parity  is  used  so  the  "stuck  at  zero"  or 
"stuck  at  one"  states  (caused  by  memory  power  failure)  may  be  detected.  The  memory 
controllers  monitor  the  DONE  signals  from  the  processors  to  the  host.  The  controller 
switches  t  he  status  of  the  left  and  right  halves  of  the  memory  I  reference  paragraph 
3.2.2. 1.  Read  Write  signals  are  generated  by  the  on-chip  Winograd  processor  control  cir¬ 
cuitry  The  access  tune  for  the  data  memories  mus:  be  less  than  28  ns  (two  'dock  cycles: 
reference  paragraph  3.5.).  Currently,  off-the-shelf  25(»K-I)it  memories  have  access  tunes 


of  approximately  40  ns  30l.  If  the  trends  in  memory  design  hold  to  the  same  pattern 
as  the  past  ten  years,  memories  with  the  required  capacity  and  access  times  should  be 
available  in  1987  [11.  19,  26l.  Current  256K-bit  memories  use  on-chip  error-correcting 
codes  (ECC)  to  provide  additional  fault  detection  capability  [23,  30 j .  Also,  spare  rows 
or  columns  are  provided  for  internal  reconfiguration  if  necessary  [23 i. 

Data  memory  characteristics  are  summarized  in  Table  3-2. 

3.5.  Fault  Tolerance. 

Fault  tolerance  is  the  ability  to  operate  in  the  presence  of  faults  and  provide  useful 
results  to  users  24].  Many  ideas  have  been  expressed  on  the  issue  of  fault  tolerance: 
however,  most  deal  with  the  concepts  of  fault  avoidance,  fault  detection,  and  system 
recovery  24,  27] .  Before  discussing  these  three  concepts  and  their  application  to  our 
system  design  in  the  following  paragraphs,  some  definition  of  terms  are  presented. 

Error:  the  resource  (system)  assumes  an  undesirable  state 

2 

Failure:  the  user  (host)  perceives  the  resource  ceases  to  deliver  an  expected  service 


Data  Memory  Characteristics 


Characteristic 


Technology 

CMOS 

Packaging 

DIP 

Pin-(irid  Array 

Size 

4.5  sq.  in.  1  a) 

Pin  Count 

48 

Access  Time 

25  ns 

Power 

!  1(X)  m\V  (hi 

la)  area  for  18- p i n  DIP 

Mil  avernee  for  256  Kbit  chips 


Table  3 
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Fault:  the  hypothesized  cause  of  an  error  or  failure  21. 

Reliability:  how  often  a  component  fails  to  perform  its  1  unction 

[28] . 

Availability:  the  probability  a  system  is  operational  at  a  given  time 

[28] . 


3.5.1.  Fault  Avoidance.  Fault  avoidance  increases  reliability  by  lessening  the  proba¬ 
bility  of  failures  and  errors.  Component  design  and  environmental  hardening  are  the  two 
most  popular  methods  used  for  fault  avoidance  [24].  Component  design  reduces  errors 
by  careful  signal  routing  and  increased  circuit  integration,  while  environmental  harden¬ 
ing  seeks  to  protect  the  system  from  outside  interference.  Some  examples  of  fault 
avoidance  are  single-chip  microprocessors  (component  design)  and  crystal  ovens  (environ¬ 
mental  hardening).  In  both  cases,  emphasis  is  placed  on  removing  the  causes  of  faults, 
rather  than  the  effects  of  the  faults. 

The  Winograd  processors  employ  component  design  to  achieve  fault  avoidance; 
however,  the  processors  must  be  fully  operational  before  fault  avoidance  is  realized.  To 
this  end.  chip  testability  is  of  prime  importance  6.  22!.  The  individual  chips  must  be 
certified  as  fully  operational  with  no  faults  before  they  can  be  used  in  a  system.  The 
testability  of  the  Winograd  processors  is  enhanced  with  multiplexers  which  allow  the 
pins  to  operate  in  two  modes  (Test  and  Operate).  The  Test  mode  allows  test  vectors  to 
be  written  to  and  outputs  to  be  read  from  most  internal  circuits.  Individual  memory 
chips  also  must  be  capable  of  being  tested.  Since  commercially  available  memory  chips 
will  be  used,  this  feature  is  assumed  to  exist. 
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3.5.2.  Fault  Detection.  Many  techniques  are  available  for  fault  detection;  two  of  the 
most  popular  are  information  coding  (also  called  error-correction  coding  or  ECC)  and 
consistency  checking  [12,  23l.  Information  coding  may  be  used  to  detect  and  correct  bit 
errors.  Parity  checking  is  a  form  of  information  coding  which  provides  bit  error  detec¬ 
tion.  Although  simple  and  easy  to  implement,  parity  checking  has  a  limitation  in  that  it 
may  not  detect  multiple-bit  errors.  A  more  powerful  ECC  is  the  single-error  correction, 
double-error  detection  (SEC-DED)  Hamming  code.  This  code  may  be  used  with  parity 
checking  to  further  reduce  bit  error  probabilities.  In  the  PFA  system,  the  parity  check¬ 
ing  of  the  Winograd  processors  provides  an  independent  check  on  the  data  memories.  If 
one  memory  chip  should  suffer  catastrophic  failure,  the  on-chip  ECC  would  be  useless. 
The  Winograd  parity  checking  allows  detection  of  this  case. 

Consistency  checking  usually  involves  extra  processors  which  operate  in  parallel 
with  active  processors.  These  extra  processors  (sometimes  referred  to  as  watchdog  pro¬ 
cessors)  compare  the  results  from  their  computations  and  the  results  from  the  active  pro¬ 
cessor  to  determine  if  an  error  has  occurred  12!.  The  Winograd  processors  have  the 
capability  of  operating  actively  or  in  the  watchdog  mode.  The  interface  processor  sends 
control  signals  to  the  Winograd  processors  which  configure  the  processors  as  either  active 
or  watchdog.  There  is  one  active  Winograd  processor  and  two  watchdog  processors. 
When  the  interface  processor  sends  the  OPERATE  signal  to  the  active  processor,  it  also 
sends  OPERATE  to  the  two  watchdog  processors.  The  watchdog  processors  receive  the 
same  data  as  the  active  processor  and  compute  the  same  DFT:  however,  the  watchdogs 
illy  monitor  the  output  data  lines  of  the  active  processor  An  err0r  signal  is  sent  to  the 
interface  processor  if  either  one  or  both  watchdog  processors  detect  an  error  on  the  data 
lines.  Also,  the  watchdog  processors  monitor  the  input  and  output  address  lines  of  the 
active  processor  and  Hag  any  addressing  errors.  Thus,  the  watchdog  processors  detect 
both  data  anti  address  errors  of  the  active  processor  Figure  :5--l  shows  the  configuration 
of  the  active  and  Watchdog  processors. 
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Figure  3-4  Active/ Watchdog  Processors 


3.5.3.  System  Recovery.  System  recovery  is  the  process  of  removing  the  ellects  of 
detected  faults  from  the  system.  This  may  accomplished  through  space  redundancy 
(extra  devices)  or  time  redundancy  (computation  retry)  ’21  Space  redundancy  may  he 
realized  using  multi-processor  voting  (using  the  majority  "vote"  of  an  odd  number  of 
processors)  or  through  systolic  arrays  of  autonomous  processors  10  .  Time  redundancy 
requires  enough  time  he  scheduled  to  recompute  the  problem,  lowering  system 
throughput.  The  problem  may  be  recomputed  using  different  components,  a  different 
algorithm,  or  both  24 i.  One  of  the  risks  of  computational  retry  is  entering  an  infinite 


loop  when  a  hard  fault  exists  (a  hard  fault  requires  hardware  replacement  to  resume 
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proper  operation).  Space  and  time  redundancy  may  be  combined  into  a  hybrid  realiza¬ 
tion  of  system  recovery.  Spare  resources  are  switched  from  standby  to  active  mode  to 
replace  devices  which  have  exhibited  a  given  number  of  faults. 

System  recovery  is  implemented  by  the  interface  processor.  The  interface  processor 
monitors  the  watchdog  error  lines  and  the  parity  error  lines  from  the  active  Winograd 
processor.  Parity  errors  are  recorded  and  may  be  used  for  memory  reconfiguration  (see 
below).  If  both  watchdog  processors  detect  the  same  error  condition,  the  interface  pro¬ 
cessor  may  remove  the  active  processor  from  the  system,  designate  one  of  the  watchdog 
processors  as  the  active  processor,  and  continue  operation  with  one  active  processor  and 
one  watchdog  processor  (rather  than  two  watchdog  processors  and  one  active  processor). 
If  only  one  watchdog  processor  detects  an  error,  the  interface  processor  may  restart  the 
computation  (issue  another  OPERATE  signal)  or  it  may  remove  the  watchdog  processor 
and  continue  operation  with  only  one  active  processor  and  one  watchdog  processor. 
Data  memories  may  employ  space  redundancy,  as  well  as  parity  and  error-correction 
coding.  Extra  memory  locations  may  be  used  when  on-chip  ECC  indicates  several  suc¬ 
cessive  errors  23i  or  when  the  interface  processor  has  recorded  several  parity  errors 
from  the  same  memory  chip.  A  summary  of  the  fault  tolerant  features  incorporated  into 
the  design  of  our  VLSI  circuit  is  given  is  Table  3-3. 
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3.6.  Computational  Throughput. 

The  Winograd  processors  should  employ  a  clock  rate  of  70  MHz.  A  70  MHz  clock 
rate  means  a  clock  cycle  time  of  14.3  ns.  Thus,  each  32-cycle  period  has  160  ns  (457.6 
ns).  The  latency  through  the  16-point  Winograd  processor  is  117  clock  cycles;  thus,  the 
16-point  processor  takes: 

2  X  117  =  234  (first  and  last  sets  of  16  data  words) 

253  X  32  =  $096  (255  -  2  =  253) 
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!  Fault  Tolerant  Features 

Characteristic 

Tvpe  or  Value 

Fault  Avoidance 

Component  Design  (a)  (b) 

Fault  Detection 

ECC  (b) 

Consistency  Checking  (a) 

System  Recovery 

Space  Redundancy  (a)  (b) 

Time  Redundancy  (a) 

(a)  Winograd  processors 

(b)  data  memories 

Table  3-3 


8096  4-  234  =  8330  clock  cycles 
8330  X  14.3  ns  =  119.12  /is 

Similarly,  the  15-point  processor  takes  119.06  / is  and  the  17-point  processor  takes  119.18 
/is  to  complete  their  passes  through  the  data.  Thus,  the  latency  through  the  4080-point 
PFA  pipeline  is: 

119.12  -  119.06  -  119.18  =  357.36  /is 

Adding  2.64  / is  for  overhead  (handshaking,  etc.)  brings  the  latency  through  the  4080- 
point  PFA  pipeline  to  360  fis.  However,  once  the  pipeline  is  filled.  DFT  results  will 
appear  at  the  output  memory  every  120  / is  (8333  results  per  second).  This  throughput 
will  allow  a  4080-point  PFA  array  element  (consisting  of  Winograd  processors,  dual-port 
memories,  and  and  interface  processor)  to  meet  many  of  the  high  data  rate  applications, 
such  as  synthetic  aperture  radar  (reference  Chapter  4).  The  number  of  arithmetic  opera¬ 
tions  computed  for  each  Winograd  processor  and  the  4080- point  system  are  given  in 
Table  3-4. 

3.7.  Summary. 

The  material  in  this  section  has  shown  how  the  algorithms  presented  in  Chapter  2 
mapped  into  a  VLSI  architecture.  The  Winograd  processors  have  three  types  of  eirnii- 
trv:  control,  input  output,  and  arithmetic.  There  will  be  three  phases  of  fabrication  for 


Rate  of  Arithmetic  Operations 

Processor 

MAPSfbi 

15 

79 

378 

16 

43 

323 

17 

144 

646 

4080 

266 

1347 

(a)  Millions  of  Multiplications  per  Second 

(b)  Millions  of  Additions  per  Second 

Table  3-4 


the  Winograd  processors;  the  ultimate  design  will  use  1.25  pm  CMOS,  with  the  processor 
and  all  peripheral  devices  mounted  on  a  hybrid  circuit  package.  The  4080-point  PFA 
system  demonstrated  how  longer  blocklength  DFTs  could  be  computed  using  the  Wino¬ 
grad  processors,  data  memories,  and  an  interface  processor.  The  computational 
throughput  of  the  4080-point  PFA  system,  assuming  the  Winograd  processors  use  an 
internal  clock  rate  of  70  MHz,  will  be  8300  DFTs  per  second  (i.e.,  a  new  DFT  result 
every  120  ps).  Thus,  the  ability  to  compute  DFTs  using  VLSI  hardware  has  been  shown 
to  be  feasible.  The  next  chapter  will  show  the  circuits  also  provide  the  required  numeri¬ 
cal  accuracy  for  many  applications. 
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Chapter  4 

Numerical  Performance  of  Winograd  Processors 


4.1.  Overview. 

The  previous  chapter  discussed  the  transition  from  the  mathematical  algorithms 
presented  in  Chapter  2  to  VLSI  architecture  which  implemented  the  algorithms.  The 
material  in  this  chapter  will  describe  how  well  the  VLSI  circuits  perform  the  DFT  com¬ 
putation,  using  numerical  accuracy  as  the  metric.  A  comparison  will  be  made  using 
software  programs  to  simulate  the  operation  of  the  VLSI  circuits  and  to  compute  a  stan¬ 
dard  result.  The  simulation  programs  use  integer  arithmetic  to  achieve  the  same  results 
as  the  VLSI  Winograd  processors,  while  the  standard  programs  use  double-precision  real 
arithmetic  to  provide  ’’correct”  results  (within  the  limits  of  the  computer  system).  The 
flow  of  information  in  this  chapter  will  be  to  first  present  material  on  the  metric  used  for 
numerical  accuracy.  Next,  the  programs  used  for  the  comparison  are  discussed.  Finally, 
the  results  of  the  comparison  are  given. 

4.2.  Signal- to- Noise  Ratio  and  Noise  Sources. 

The  metric  used  to  measure  the  numerical  accuracy  between  results  from  the  simu¬ 
lation  and  standard  programs  is  the  signal-to-noise  ratio  (SNR).  The  SNR  has  been  used 
for  many  years  to  measure  the  quality  of  communication  systems  The  SNR  is  the  ratio 
of  the  power  in  the  signal  to  the  power  in  the  noise  (interfering  signal).  Since  the  power 
in  the  signal  may  be  many  orders  of  magnitude  greater  than  the  power  in  the  noise,  a 
logarithmic  form  of  the  SNR  is  often  used;  this  form  of  the  SNR  converts  the  dimension¬ 
less  ratio  into  a  number  with  the  units  of  decibels  IdB)  The  formula  for  converting  t he 
ratio  (signal  power  to  noise  power)  to  the  logarithmic  form  is  shown  below: 
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The  power  in  the  signal  is  the  the  sum  of  the  power  at  each  frequency  component;  the 
power  at  each  component  is  found  by  squaring  the  magnitude  of  the  voltage  signal  at 
that  particular  component.  A  similar  computation  is  used  to  compute  the  power  of  the 
noise.  In  this  case,  there  is  only  one  signal  source,  the  ’’correct”  result  computed  by  the 
standard  portion  of  the  program.  However,  there  are  several  sources  of  noise,  mostly 
caused  by  the  scaling  required  to  prevent  arithmetic  overflow  and  the  finite-length 
coefficients  used  by  the  multipliers  in  the  Winograd  processors.  These  noise  sources  are 
now  discussed  in  greater  detail. 

4.2.1.  Scaling.  The  inputs  to  the  arithmetic  circuitry  must  be  scaled  down  to  avoid 
arithmetic  overflow  and  because  the  multipliers  expect  each  input  to  have  two  sign 
extensions  61.  Whenever  a  DFT  is  computed,  the  transform  results  experience  arith¬ 
metic  growth;  this  is  most  easily  seen  by  observing  Parseval’s  relationship; 
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Vj  represents  the  input  sequence; 

Vk  represents  the  transformed  sequence; 

.V  is  the  DFT  blocklength. 

The  \  LSI  circuit  uses  a  blocked  floating-point  number  representation.  This  is  similar  to 
an  integer  representation,  but  a  number  of  bits  are  set  aside  for  an  exponent.  Thus,  a 
number  is  represented  by  a  fixed  length  mantissa  !‘23  bits)  and  an  exponent  (a  positive 
3- bit  number).  For  a  15-point  DFT.  the  transform  results  may  grow  by  a  factor  of 
fifteen.  This  can  be  seen  bv  looking  at  the  pre-addition  matrix  of  the  I5-pomt  DFT 
(reference  Appendix  A).  Since  fifteen  can  be  represented  by  four  bits  |(>).  the  15- 
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point  Winograd  processor  appends  five  sign  extensions  to  the  inputs  to  the  arithmetic 
circuitry,  four  for  arithmetic  growth  and  one  for  the  multipliers.  The  16-point  DFT 
requires  four  sign  extensions  and  the  17-point  DFT  requires  six  sign  extensions  (reference 
paragraph  3.2.3.).  The  16-point  YVFTA  chip  only  needs  three  sign  extensions  to  avoid 
arithmetic  overflow  since  the  two  inputs  to  the  multipliers  which  have  more  than  eight 
terms  are  multiplied  by  trivial  coefficients.  Multiplication  by  trivial  coefficients  only 
requires  a  shift,  rather  than  passing  the  inputs  through  the  multiplier  circuitry.  The 
extra  sign  extensions  are  not  required  for  all  inputs,  they  are  insurance  against  arith¬ 
metic  overflow.  Thus,  the  numerical  accuracy  of  the  results  suffer  since  all  the  sign 
extensions  are  not  used.  On  the  average,  two  of  the  sign  extensions  will  not  be  used. 
Since  random  numbers  are  used,  the  output  spectrum  will  be  fairly  flat,  providing  the 

o 

worst  case  for  scaling.  If  the  variance  of  the  inputs  (tr")  is  assumed  to  be  <6  .  then  the 
variance  of  the  outputs  (cry)  can  be  found  using  (4-2): 


— 

16  0 


(4-3) 


or.  collecting  terms  and  noting  o~v  is  the  same  for  each  term  on  the  right-hand  side  of 
14-3): 
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Thus,  the  outputs  are  four  times  as  large  as  the  inputs,  on  the  average.  This  uses  only 
two  of  the  sign  extensions  provided  to  prevent  arithmetic  overflow:  the  other  sign  exten¬ 
sions  are  basically  wasted,  as  far  as  numerical  accuracy  is  concerned.  Therefore,  there 
will  be  two  wasted  sign  extensions  for  the  15-point  anti  16-point  simulations  and  three 
wasted  sign  extensions  for  the  17-point  simulation  This  translates  info  a  12  dB  loss  in 
SNR  for  the  15-point  and  16-point  simulations  (I  bit  —  6  dB  17  )  Scaling  provides  the 
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largest  source  of  noise  in  the  Winograd  processors. 


4.2.2.  Arithmetic  Roundoff.  Another  source  of  noise  is  created  when  the  32-bit 
results  from  the  arithmetic  circuitry  must  be  shortened  to  twenty-three  bits.  There  are 
two  choices;  the  results  may  either  be  truncated  or  rounded.  Truncation,  where  the 
least  significant  nine  bits  are  simply  ignored,  gives  an  error  of  (4  bit  (on  the  average). 
Rounding,  where  the  twenty-third  bit  is  rounded  up  if  the  twenty-fourth  bit  is  a  one. 
gives  an  average  error  of  *  bit.  The  noise  power  provided  by  this  source  is: 
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10  X  log10l  =  0  dB 

Thus,  rounding  provides  better  arithmetic  performance  than  truncation.  However, 
rounding  is  more  difficult  to  implement,  requiring  a  latch  and  an  adder,  while  truncation 
requires  no  special  circuitry.  In  the  VLSI  implementation,  the  rounding  circuitry  added 
no  significant  delay  to  the  propagation  of  the  arithmetic  results  to  the  SIPO  register: 
thus,  rounding  was  chosen  instead  of  truncation.  Rounding  is  also  done  when  the  60-bit 
products  are  changed  to  thirty-two  bits  in  the  multipliers;  however,  this  source  of  noise 
is  almost  insignificant  since  the  results  are  rounded  to  twenty-three  bits  at  the  SIPO 
register.  Figure  4-1  depicts  rounding  at  the  SIPO. 


4.2.3.  Finite-Length  Coefficients.  The  Winograd  processors  use  fixed  coefficients, 
which  are  hardwired  into  the  multipliers.  Coefficients  which  are  not  a  power  of  two  or 
combinations  of  powers  of  two  are  not  exactly  represented.  This  represents  a  noise 
source  which  could  be  very  significant,  since  many  multiplications  are  perb'rmed  m  the 
DFT  computation.  To  reduce  the  effects  of  using  finite-length  coefficients,  the  coefficient 


real  arithmetic.  The  standard  module  was  compared  with  a  direct  implementation  of 
the  DFT.  using  (2-1).  The  computer  language  chosen  for  the  simulation  programs  was 
C’;  C’  possessed  the  numerical  requirements  discussed  above  and  it  was  available  on  the 
VAX  11-780  Scientific  Support  Computer  (SSC)  at  the  Air  Force  Institute  of  Technol¬ 
ogy.  The  information  on  the  simulation  programs  will  be  presented  in  the  following 
order: 

1)  Number  representation; 

2)  Differences  between  standard  and  simulation  modules; 

Listings  of  the  simulation  programs  are  given  in  Appendix  B.  The  flowgraph  of  the 
4080-point  simulation  program  is  shown  in  Figure  4-2. 

4.3.1.  Number  Representation.  The  VLSI  circuits  use  a  two  s-complement  notation 
for  all  numbers;  the  most  significant  bit  indicates  the  sign  of  the  number,  while  the  other 
bits  indicate  the  magnitude.  Positive  numbers  have  zero  as  the  most  significant  bit;  the 
remaining  bits  represent  the  magnitude  in  the  obvious  manner.  Negative  numbers  are 
represented  by  complementing  a  positive  number  with  the  same  magnitude,  then  adding 
one.  Internal  results  are  thirty-two  bits  for  the  16-point  algorithm  (30  bits  for  the  15- 
point  and  34  bits  for  the  17-point),  coefficients  are  twenty-eight  bits,  and  input  data  are 
twenty-three  bits.  The  simulation  modules  represent  these  different  wordlengths  by 
declaring  all  variables  to  be  long  integers  (32  bits  on  the  VAX  SSC),  then  masking  the 
most  significant  bits  to  obtain  the  desired  word  length.  Tlu-  standard  modules  represent 
all  variables  as  double-precision  real  numbers. 

4.3.2.  Differences  Between  Standard  and  Simulation  Modules.  There  are  four  major 
differences  between  the  standard  and  simulation  modules  of  the  -imulation  programs. 
The  simulation  module  requires  a  special  multiply  function  arid  bif-h-vid  manipulations 
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4.3.2. 1.  Simulation  Multiply  Routine.  The  simulation  module  cannot  multiply  two 

long  integers  using  the  intrinsic  multiply  function  of  ‘C’;  the  multipliers  in  the  VLSI  cir- 

3  cuit  round  the  results  to  thirty-two  bits,  rather  than  truncating  them  as  the  intrinsic 

>.  multiply  function  does.  However,  more  importantly,  the  products  from  the  VLSI  multi- 

v  pliers  are  sixty  bits  long;  using  the  intrinsic  multiply  function  in  ’C’  gives  64-bit  pro- 

\ 

I  ducts  since  ’C’  expects  long  integers  to  be  thirty-two  bits.  Thus,  a  special  multiply  func¬ 

tion,  which  gives  32-bit  results  rounded  down  from  60-bit  products  is  used.  The  algo¬ 
rithm  used  in  the  function  is  presented  in  Aho,  et.  al  [l].  The  simulation  multiplication 
|  routine  is  shown  in  Figure  4-3. 


S 
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4. 3. 2. 2.  Simulation  Scaling  and  Rounding.  The  bit-level  manipulations  required  by 
the  simulation  modules  are  manifested  in  the  scaling  and  rounding  routines  (reference 
paragraphs  4.2.1.  and  3.2.1.).  Scaling  is  implemented  as  a  logical  shifting  operation: 
only  the  zero  filling  is  done,  since  the  ’C’  language  on  the  VAX  SSC  (also  using  two's- 
complement  notation)  automatically  supplies  the  sign  extensions.  Rounding  is  done  by 
masking  the  bit  used  for  rounding  decisions  (using  the  Sz  operator  in  ‘C’),  then  incre¬ 
menting  the  result  if  the  output  of  the  masking  operation  is  a  logical  one.  The  inputs  to 
the  standard  and  simulation  modules  were  outputs  of  the  random  number  generator 
intrinsic  to  C’,  masked  to  twenty-three  bits.  The  period  of  the  generator  (2  ')  was 
sufficient  to  ensure  random  inputs.  The  range  of  the  inputs  was  i  1  to  remove  the  DC 
bias. 


4. 3. 2. 3.  Simulation  Coefficients.  The  decimal  coefficients  of  the  standard  Winograd 
modules  had  to  be  changed  to  integer  representation  for  the  simulation  module  The 
VLSI  circuit  used  28-bit  coefficients:  thus,  the  simulation  had  t  .  translate  the  decimal 
coefficients  into  a  usable  form.  First,  the  decimal  coefficient-  u.-o-  d.-d  |..wn  -o  ill  the 
coefficients  had  a  magnitude  less  than  one.  For  the  LVp. nut  ■  •••  lli  o  ut-  t|,..  Ifice-ms 
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Figure  4-3  Simulation  Multiplying 


were  divided  by  four  since  the  largest  coefficient  was  2.308.  Similarly,  the  16-point 
coefficients  were  divided  by  two  (largest  coefficient  was  1.383)  and  the  17-point 
coefficients  were  divided  by  eight  (largest  was  4.081).  After  the  coefficients  were  scaled 
such  that  they  were  all  between  ±  1,  were  multiplied  by  2'  ,  since  28-bit,  two  s  comple- 
ment  numbers  are  bounded  by  2‘  -  1  and  ~(2  ).  The  results  from  the  15-point  and  the 
17-point  modules  had  to  be  multiplied  by  factors  of  two  and  four,  respectively,  to  pro¬ 
vide  a  common  reference  with  the  16-point  module. 


4.3.2. 4.  Simulation  Versus  Standard  Outputs.  Since  the  simulation  modules  incor¬ 
porated  scaling  and  integer  multiplication,  the  outputs  from  the  simulation  module  were 
of  a  different  order  of  magnitude  than  the  results  from  the  standard  module.  The  inputs 
to  the  simulation  module  were  shifted  left  four  places  (scaled  up  by  24  =  16).  The  simu¬ 
lation  multiplier  results  were  twice  as  large  as  they  should  have  been  since  the  simula¬ 
tion  module  shifted  the  results  to  the  right  twenty-six  places,  while  the  decimal 
coefficients  of  the  standard  module  were  multiplied  by  2^  to  represent  the  28-bit 
coefficients  used  in  the  VLSI  circuits.  The  results  of  the  arithmetic  circuitry  were  shifted 
right  by  9  places  (scaled  down  by  2  =  512).  Thus,  the  net  effect  of  the  input  scaling, 
integer  multiplication,  and  SIPO  rounding  was: 


input  X  16  X  2  input 

512  16 


(4-6) 


Thus,  the  standard  results  were  sixteen  times  ns  large  as  the  outputs  from  the  simula¬ 
tion  module.  The  standard  results  were  divided  by  sixteen,  rather  than  scaling  the  simu¬ 
lation  inputs  up  by  sixteen.  This  was  done  since  the  VLSI  circuit  will  compute  the  DFT 
according  to  the  algorithm  used  by  the  simulation  module:  thus,  the  numerical  accuracy 


hould  be  measured  using  those  results,  rather  than  a  shifted  version  <.f  the  results 


4.4.  Simulation  Results. 


There  were  two  types  of  simulation  results,  one  when  the  results  from  the  standard 
modules  were  compared  with  results  from  the  direct  DFT  and  the  other  when  the  stan¬ 
dard  results  were  compared  with  the  simulation  results.  In  both  cases,  the  signal-to- 
noise  ratio  in  dB  was  used. 

4.4.1.  Standard  Versus  Direct  DFT.  The  results  from  the  standard  Winograd 
modules  were  compared  with  results  from  the  direct  DFT  for  six  different  blocklengths. 
For  each  blocklength,  100  different  set  of  random-number  inputs  were  used  and  the  aver¬ 
age  SNR  computed  for  both  the  real  and  imaginary  outputs.  Table  4-1  shows  the 
results  for  this  portion  of  the  testing. 

The  decrease  from  approximately  275  dB  for  the  single-factor  DFTs  to  175  dB  for 
the  double-factor  DFTs  was  probably  due  to  the  arithmetic  roundoff  in  the  direct  DFT. 
Since  the  direct  DFT  performed  many  more  operations  than  the  Winograd  modules,  the 
cumulative  roundoff  from  the  computer  increased,  possibly  overwhelming  the  noise  due 
to  the  difference  between  the  results  from  the  two  routines.  Still,  the  results  show  the 
standard  modules  were  computing  the  DFT,  rather  than  some  other  mathematical  func¬ 
tion. 


Standard  Versus  Direct 
DFT  Results  (dB) 
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4.4.2.  Standard  Versus  Simulation  Results.  For  this  area  of  testing,  the  DFT  block- 
lengths  were  15,  16,  and  17.  Again,  100  cases  of  random-number  inputs  were  used  to 
compute  an  average  SNR  for  each  blocklength.  Results  are  shown  in  Table  4-2. 


The  results  from  this  comparison  agree  well  with  the  theoretical  expectations  given 
in  paragraph  4.2.  Recall  from  paragraph  4.2.,  the  loss  due  to  scaling  was  12  dB  for  the 
15-point  and  16-point  simulations  and  18  dB  for  the  17-point  simulation.  For  all  three 
simulations,  the  loss  due  to  SIPO  rounding  was  approximately  zero  dB.  Since  the  out¬ 
puts  were  twenty-three  bits,  the  output  signal  power  should  have  been: 


23  bits  x  6  dB/ bit  =  138  dB 


Thus,  the  theoretical  output  SNR  for  the  15-point  and  16-point  simulation  should  have 
been  126  dB  and  the  output  SNR  for  the  17-point  simulation  should  have  been  120  dB. 
These  results  compare  well  with  results  from  other  VLSI  circuits,  notably  the  CUSP  chip 
(94  db  SNR  for  a  16-point  DFT  using  a  20-bit  number  representation  (21]). 

4.4.3.  Application  Accuracy  Requirements.  The  preceding  paragraphs  showed  how 
well  the  Winograd  processors  performed  the  DFT  computation.  But.  do  these  results 
indicate  our  VLSI  circuit  can  be  used  for  the  applications  which  need  high  accuracy?  A 
typical  synthetic  aperture  radar  (SAR)  example  may  provide  an  answer,  (reference 
Table  4-3  for  SAR  parameters  taken  from  Hovanessian  9l). 
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SAR  Parameters 


Parameter 


Value 


PRF 

Pulse  Width 
Transmit  Frequency 
Vehicle  Velocity 
Vehicle  Altitude 
Antenna  Width 
Range  Beamwidth 
Range  Width 
Azimuth  Length 


1225  pps 

33.8  [is 
1275  MHz 
7.14  km/s 
375  km 

2  m 
6.73  0 
100  km 

10.9  km 


Table  4-3 


The  time  to  fly  one  range  beamwidth  is  1.53  seconds.  The  sampling  rate  is  12 
megawords  per  second  (i.e.,  12  million  23-bit  complex  words  are  collected  from  the  sensor 
each  second).  Since  a  4080  X  4080  image  is  collected  every  1.53  seconds,  only  one  PFA 
array  element  (reference  Chapter  3)  is  required  to  keep  pace  with  the  sensor  data  output 
(reference  paragraph  3.6.).  The  23-bit  number  presentation  provides  a  potentially 
greater  accuracy  than  current  16-  or  20-bit  systems.  For  a  16-bit  system,  the  best  SNR 
which  can  be  expected  (using  the  6  dB/bit  rule)  is  96  dB.  Thus,  even  without  taking 
into  account  noise,  the  VLSI  architecture  given  in  Chapter  3  has  a  potentially  better 
numerical  accuracy  than  a  L6-bit  system.  Although  comparable,  or  even  better,  accu¬ 
racy  can  be  obtained  using  general-purpose  computers,  the  PFA  array  element  can  be 
placed  on  the  vehicle  for  real-time  signal  processing.  Also,  the  PFA  array  element  can 
be  used  to  code  the  image  data  sent  back  to  the  user  (using  transform  coding  techniques 
18,). 


4.5.  Summary. 


In  this  section,  the  methods  of  finding  the  numerical  accuracy  of  the  VLSI  circuits, 
using  software  simulations,  have  been  described.  First,  the  metric  for  determining:  the 
accuracy,  the  signal- to- noise  ratio,  was  presented.  Then,  two  of  the  major  sources  of 
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noise,  scaling  and  SIPO  rounding,  were  discussed.  Next,  the  simulation  programs  were 
given,  with  special  attention  paid  to  the  differences  between  the  standard  and  simulation 
modules.  The  results  of  the  standard  module  compared  favorably  with  outputs  from  the 
direct  computation  of  the  DFT,  showing  the  standard  module  did  indeed  compute  the 
DFT.  Finally,  the  standard  and  simulation  results  were  compared;  excellent  agreement 
with  theoretical  expectations  was  noted.  Also,  the  numerical  accuracy  of  the  simulation 
results  compares  favorably  with  other  VLSI  DFT  processors  and  shows  the  VLSI  chips 
can  satisfy  the  accuracy  requirements  of  SAR  signal  processing. 
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Chapter  5 

Results,  Conclusions,  and  Recommendations 

5.1.  Overview. 

The  previous  chapters  have  shown  the  transition  from  mathematical  algorithms  to 
a  VLSI  implementation  of  those  algorithms  and  how  well  the  VLSI  implementation  per¬ 
forms,  using  numerical  accuracy  as  the  metric.  The  material  in  this  chapter  gives  results 
of  the  research,  presents  conclusions  of  the  research,  and  discusses  recommendations  for 
future  research. 

5.2.  Results. 

There  were  three  results  from  this  research:  1)  determining  the  accuracy  of  the 
simulation  programs;  2)  verifying  the  accuracy  of  the  standard  against  which  the  simula¬ 
tion  results  were  measured;  and  3)  showing  the  VLSI  architecture  of  the  YVinograd  and 
PFA  processors  to  be  viable.  The  metric  used  to  determine  the  numerical  accuracy  was 
the  signal- to- noise  (SNR)  between  the  outputs  from  a  standard  module  and  the 
difference  between  standard  and  simulation  modules.  The  SNR  for  the  15-point  and  16- 
point  modules  averaged  127  dB  (11  dB  down  from  the  standard),  while  the  SNR  for  the 
17-point  module  averaged  121  dB  (18  dB  down  from  the  standard).  These  results  were 
in  excellent  agreement  with  theoretical  expectations  li  e.,  losses  due  to  arithmetic  round¬ 
ing  and  scaling). 

The  outputs  from  the  standard  module  were  checked  against  results  from  a  direct 
computation  of  the  DFT.  Comparison  of  the  Winograd  modules  (15.  16.  and  17)  with 
the  direct  DFT  showed  an  SNR  in  excess  of  270  dB.  The  SNR  of  the  direct  DFT  and 
the  2- fact  or  PFA  modules  (210.  255.  and  272)  was  170  dB.  These  results  indicate  the 
standard  modules  were  computing  the  DFT.  rather  than  some  other  mathematical 


operation. 


The  VLSI  VVinograd  processors  operate  autonomously,  requiring  simple  handshak¬ 
ing  with  an  interface  processor.  The  bit-serial  architecture  of  the  arithmetic  circuitry 
allowed  fast  internal  clock  rates  (targeted  70  MHz  for  1.25  ps  CMOS  circuits)  and  high 
throughput  (over  8300  4080-point  DFTs  computed  per  second  if  PFA  pipeline  architec¬ 
ture  used).  Fault  tolerance  was  integral  to  the  design  effort,  with  watchdog  processors 
and  parity  checking  enabling  the  detection  of  errors  in  the  data,  addressing,  and  memory 
circuits.  As  seen  in  Chapters  3  and  4,  the  throughput  and  numerical  accuracy  of  the 
proposed  VLSI  chips  meets  existing  SAR  requirements. 

5.3.  Conclusions. 

There  are  conclusions  to  be  drawn  from  each  area  of  the  research  (theory,  VLSI 
implementation,  and  numerical  simulation). 


5.3.1.  Theory: 

1)  Winograd  modules  provide  for  the  most  efficient  realization  of  the  Good- 
Thomas  PFA  in  terms  of  number  of  multiplications  (i.e.,  using  VVinograd  modules 
guarantees  the  fewest  number  of  multiplications  for  a  given  DFT  block  length  used  in  the 
Good-Thomas  PFA); 

2)  The  use  of  the  Good-Thomas  PFA  allows  for  smaller  multiplication  matrices, 
compared  to  using  the  large  VVinograd  algorithm;  thus,  even  though  the  large  VVinograd 
algorithm  actually  has  fewer  multiplications,  the  Good-Thomas  PFA  mav  be  preferred 
because  the  multiplication  matrices  are  smaller  (easier  to  store  in  a  memory  and  imple¬ 
ment  in  a  VLSI  circuit). 


5.3.2.  VLSI  Implementation: 


1)  Using  the  Good-Thomas  PFA  with  Winograd  modules  allows  an  efficient  use  of 
area  on  the  chip;  the  VLSI  circuit  described  in  Chapter  3  can  perform  more  multiplica¬ 
tions  per  square  centimeter  of  silicon  for  a  given  DFT  blocklength  than  any  other  VLSI 
circuit  implementing  a  DFT  algorithm; 

2)  The  use  of  Winograd  processors  with  dual-ported  memories  allows  for  several 
DFT  blocklengths  to  be  computed  using  the  same  building  blocks  (e.g.,  a  255-point  DFT 
can  be  performed  by  combining  the  15-point  and  17-point  processors  with  the  appropri¬ 
ate  memories;  the  addressing  and  control  circuitry  exist  on  the  processors  themselves  to 
perform  the  different  DFT  blocklengths[22|); 

3)  The  design  of  watchdog  and  active  processors  allows  for  a  fault  tolerant  archi¬ 
tecture  which  can  withstand  several  faults  and  continue  to  deliver  correct  results.  The 
interface  processor  can  reconfigure  the  processors,  taking  the  faulty  processor  off-line  and 
replacing  it  with  one  of  the  watchdog  processors  (other  fault  tolerant  characteristics  of 
the  VLSI  design  is  use  of  parity  in  the  data  representation  and  the  use  of  on-chip  error- 
correction  coding  (ECC)  on  the  data  memories). 


5.3.3.  Numerical  Simulation: 

1)  The  most  significant  source  of  noise  in  the  simulation  is  the  scaling  of  the  inputs 
to  prevent  arithmetic  overflow; 

2)  The  outputs  from  the  standard  module  do  compute  the  DFT; 

3)  The  outputs  from  the  standard  module  must  be  scaled  for  comparison  with  out¬ 
puts  from  the  simulation  module; 

1)  According  to  the  simulation  results,  the  VLSI  circuit  should  provide  acceptable 
numerical  accuracy  for  DFT  blocklengths  of  15.  16.  and  17. 
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|  5.4.  Recommendations. 

The  recommendations  for  future  research  fall  into  three  categories:  theory,  VLSI 
implementation,  and  numerical  simulation. 

) 

5.4.1.  Theory.  The  recommendation  for  this  area  of  the  research  is  to  verify  the  17- 
^  point  small  Winograd  algorithm.  The  algorithm  used  in  the  standard  and  simulation 

modules  was  adapted  from  Burrus  and  Johnson[l3];  that  algorithm  was  a  modified 
Winograd  algorithm.  The  modifications  were  matrix  manipulations  which  changed  some 
of  the  arithmetic  operations. 

» 


5.4.2.  VLSI  Implementation: 

jjj  V  y  1)  Verify  the  control  signals  and  their  timing  for  all  three  Winograd  processors; 

•>  2)  Design  the  Winograd  processors  for  fabrication: 

3)  Design  a  test  plan  (including  test  procedures  and  test  vectors)  for  the  fabricated 
^  processors. 


i 


I. 
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5.4.3.  Numerical  Simulation: 

1)  Compute  the  numerical  accuracy  for  the  following  DFT  block  lengths:  210.  235, 
272.  and  4080. 

2)  Determine  the  effects  of  different  coefficient  wordlengths  (20.  24.  28.  and  32)  on 
the  numerical  accuracy  for  each  DFT  blockiength  to  determine  if  a  shorter  coefficient 
wordlength  can  be  used: 

3)  Determine  the  numerical  accuracy  of  each  DFT  blockiength  for  several  different 
inputs,  such  as  sine  waves,  pulses,  etc. 
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Appendix  A 

A  15-Point  DFT  Using  Winograd’s  Large  DFT  Algorithm 


A  15-point  DFT  is  developed  using  Winograd’s  large  DFT  algorithm  (reference 
paragraph  2.3.2.).  The  methods  requires  the  use  of  3-point  and  5-point  small  Winograd 
modules;  the  equations  describing  these  modules  can  be  found  in  McClellan  and  Rader  [] 
or  Winograd  [].  The  development  of  the  15- point  DFT  will  take  two  steps: 

L)  Show  the  DFT  can  be  written  as  a  Kronecker  product  of  the  3-point  and  5- 
point  small  Winograd  modules; 

2)  Expand  the  Kronecker  product  of  the  3-point  and  -5-point  small  Winograd 
modules  back  into  a  DFT  form. 

To  begin,  the  matrix  representation  of  a  15-point  DFT  is  given  in  Figure  A-l. 

The  <jJ  terms  in  the  matrix  (Figure  A-l)  are  a  mathematical  shorthand  for  express¬ 
ing  the  complex  exponential  terms  in  the  DFT.  The  arguments  of  the  exponentials  are 
multiples  of  (2tt/15);  if  the  DFT  is  thought  of  as  sampled  values  of  the  z-transform,  the 
unit  circle  is  segmented  into  fifteen  equal  increments  over  the  range  (0.  2ff).  The  values 
of  the  'jj  terms  will  be  used  later  for  comparison  with  the  results  from  the  Kronecker  pro¬ 
duct  of  the  5-point  and  3-point  modules:  the  values  of  the  j  terms  are  given  below: 
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Figure  A-l  Matrix  Representation  of  DFT 
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jsin  ( -28 7r  15)  = 
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The  rows  and  columns  of  the  matrix  in  Figure  A-l  must  he  scrambled  according  to 
the  rules  shown  in  Figures  2-2  and  2-1.  respectively.  The  row  scrambling  corresponds  to 
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Figure  A-3  Row  Scrambling  of  DFT  Matrix 

Looking  at  Figure  A-3,  one  can  see  a  pattern  of  sub-matrices  within  the  DFT 
matrix:  these  sub-matrices  are  the  result  of  the  Kronecker  product  of  the  3-point  and  5- 
point  small  Winograd  modules.  The  Kronecker  product  of  the  3-point  and  5-point 
Winograd  modules  is  shown  in  Figure  A-4. 

Thus,  the  15-point  DFT  can  be  written  as  the  Kronecker  product  of  the  5-point 
and  3-point  small  Winograd  modules.  The  matrix  representations  for  the  5-point  and  3- 
point  modules  for  shown  in  Figures  A-5a  and  A -5b.  respectively. 


The  output  vector.  V.  may  be  expressed,  using  the  associative  property  of 
Kronecker  products  for  matrices,  as  follows: 


Figure  A-4  Kronecker  Product  of  5-Point  and  3-Point 
Small  Winograd  Modules 
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Figure  A-oa  Matrix  Representation  of  5- Point 
Small  Winograd  Module 


Figure  A-5b  Matrix  Representation  of  3-Point 
Small  Winograd  Module 


V  =  (V,  ®  V2)  i 

=  [(Ct  Bj  Aj)  0  [(C2  B2  A2)  i 

=  [(C;  0  aXB,  ®  BJCA!  ®  A^)  jr 

Thus,  the  pre-addition  matrix  of  the  15-point  DFT  (Figure  A-6a)  is  the  Kronecker 
product  of  the  5-point  and  3-point  pre-addition  matrices  (Figure  A-6b).  The  multiplica¬ 
tive  and  pre-addition  matrices  for  the  15-point  DFT  (figures  A-7b  and  A-8b.  respec¬ 
tively)  are  also  Kronecker  products  of  their  5-point  and  3-point  counterparts  (figures  A- 
7b  and  A-8b,  respectively). 
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Figure  A-6a  Pre-Addition  Matrix  for  15-Point  DFT 
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Figure  A-8a  Post-Addition  Matrix  for  15-Point  DFT 


Figure  A-8b  Kronecker  Product  of  5-Point  and  3-Point 
Post-Addition  Matrices 
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To  determine  if  the  Kronecker  product  of  the  5-point  and  3-point  modules  does 
result  in  a  15-point  DFT,  take  the  product  of  all  three  matrices  to  see  if  the  resulting 
matrix  is  identical  to  the  scrambled  DFT  matrix  (reference  Figure  A-3).  First,  take  the 
product  of  the  pre-addition  and  multiplicative  matrices. 


Multiply  the  product  shown  in  Figure  A-9  by  the  post-addition  matrix  reference 
Figure  A-8a). 
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Figure  A-9  Product  of  Pre-Addition  and  Multiplicative  Matrices 
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Figure  A-10  Product  of  Pre- Addition.  Multiplicative, 
and  Post-Addition  Matrices 

The  expressions  for  the  elements  of  the  matrix  shown  in  Figure  A-10  are  given  on 
the  following  pages: 

E0.0  =  Et).l  =  E0.J  =  E0.3  =  E(U  =  1  = 

Eo,  -  E0.6  =  E0.7  =  En.s  =  Eo,  =  1 
F  =  F  =  F  —  F  =F  =i  —  0 

0.10  0.1 1  ^0.12  °0.13  Cjon  1 


E,.0  -  E,,.  ~  E,.fi  =  E,,  *  E1.12  =  1  - 

E>  ,  =~  E,  ,  =  E,  -  =  E,  =  E.  , ,  =  I  -  l,i  -  j()A7  --■=  -O  ')  -  j0  >'7 


E1.S  =  El,8  =  EU1  =  EU4  =  1  -  15  +  ^°-87  =  -°-5  +  J°-87  = 
E2,3  =  E2,8  =  E2,9  ~  E2,12  ~  ^  ~  u 

E2,4  =  E2,7  =  E2,10  *  E2,13  =  1  '  15  +  i087  =  -°'5  +  j0;87  = 
E2,5  *  E2,8  =  E2,ll  =  E2,14  =  1  '  15  '  J0  87  =  -°'5  '  J0  87  = 

E3,!  =  E3,2  =  1  =  ^ 

E34  =  E35  =  1  -  1.25  -  j  1 .53  +  0.56  +  j0.59  =  0.31  -  j0.95  = 
E37  =  E38  =  1  -  1.25  -  0.56  -  j0.59  =  -0.81  -  j0.59  =  J 
E3  xo  =  e3  n  =  !  -  1.25  -  0.56  4-  j0.59  =  -0.81  +  j  0.59  =  J 


--  E3  i3  =  E3  u  =  1  -  1.25  +  j  1 .53  +  0.56  -  j0.59  =  0.31  -  j0.95 
l=u/> 

1  -  1.5  -  j0.87  =  -0.5  -  j0.87  =  J 

1  -  1.5  T-  j0.87  :  -0.5  4”  j 0  8 1  — 

i  -  1.25  -  j l .53  -  0.56  j0.59  =  0.31  -j0.95  =  J* 

1  -  1.5  -  j0.87  -  1.25  +  1.88  -  j  1.08  -  jl  .53  -  j2.3l  -  1.33 

-  0.56  -  0.84  -  j0.48  -  jO.59  -  j0.88  -  0.51  =  -0.98  -  jO  2 1  = 

1  -  1.5  -  j0.87  -  1.25  -  1.88  -  j  1.08  -  j  l  .5-3  -  j2.31  -  1.33 

-  0.56  -  0.84  -  j0.48  -  jO.59  -  j0.88  -  0.51  =  0.67  -  jO  7  1  =  . 

1  -  1.25  -  0.56  -  jO.59  =  -0.81  -  jO.59  =  J 

1  -  1.5  -  j0.87  -  1.25  *  1.88  -  j  1 .08  -  0.50  -  0.84  -  jO  48 

-  jO.59  -  j0.88  -  0.51  =  -0.10  -  j0.99  =  J' 


,AV'A'.W"r"V 


*.W 


E,  „  =  1  -  1.5  -  j0.87  -  1.25  -r  1.88  -  jl.08  4-  0.56  -  0.84  -r  j0.48 

■i  .o 

-  j0.59  4-  j0.88  4-  0.51  =  0.92  -  j0.41  =  w1 
E<e  =  1  -  1.25  -  0.56  +  j0.59  =  -0.81  4-  j0.59  =  J 

E4  1Q  =  1  -  1.5  -  jO.87  -  1.25  4-  1.88  4-  jl.08  4-  0.56  -  0.84  -  j0.48 
4-  j0.59  -  j0.88  4-  0.51  =  0.92  4-  j0.41  =  w14 

E4  n  =  1  -  1.5  4-  j0.87  -  1.25  4-  1.88  -  jl.08  4-  0.56  -  0.84  4-  j0.48 
4-  j0.59  -  j0.88  -  0.51  =  -0.10  -  j0.99  =  u* 

E4  i2  =  i  -  1.25  4-  j  1 .53  4-  0.56  -  j0.59  =  0.31  -j0.95  =  u12 

E4  13  =  1  -  1.5  -  jO.87  -  1.25  4-  1.88  4-  jl.08  +  jl.53  -  j2.31  4-  1.33 

4-  0.56  -  0.84  -  j0.48  -  jO.59  -r  j0.88  -  0.51  =  0.67  -  j0.74  =  u;2 

E4  H  =  1  -  1.5  +  jO.87  -  1.25  4-  1.88  -  jl.08  4-  jl.53  -  j2.31  -  1.33 

-4  0.56  -  0.84  4-  j0.48  -  j0.59  4-  j0.88  4-  0.51  =  -0.98  -  j0.21  =  J 

E:,o  =  *  =  "° 

Eg  J  =  1  -  1.5  4-  jO.87  =  -0.5  4-  jO.87  =  ^10 
E.  2  =  1  -  1.5  -  j0.87  =  -0.5  -  jO.87  = 

E-3  =  1  -  1.25  -  jl.53  +  0.56  -  j0.59  =  0.31  -j0.95  =  ~3 

E.  4  —  1  -  1.5  -  jO.87  -  1.25  -  1.88  -  j  1.08  -  jl.53  -  j2.31  -  1.33 

-  0.56  -  0.84  -  j0.48  -  j0.59  -  j0.88  -  0.51  =  0.67  -  j0.74  =  *13 

E.  .  =  1  -  1.5  -  j0.87  -  1.25  -  1.88  -  jl.08  -  jl.53  -  j2.31  -  1.33 

-  0.56  -  0.84  -  j0.48  *  j0.59  -  jO  88  -  0,51  =  -0.98  -  j0.21  = 

E.  8  =  1  -  1.25  -  0.56  -  jO  59  =  -0.81  -  j0.59  =  J5 


80 


fc  .tu  i,  MiWu  f.  *  r.  *  \  i 


*  •-**** 


E,  „  =  1  -  1.5  4-  jO. 87  =  -0.5  4-  j0.87  =  u 


=  ,J0 


E73  =  1  -  1.25  -  0.56  -  j0.59  =  -0.81  -j0.59  = 


E74  =  1  -  1.5  -  j0.87  -  1.25  +  1.88  +  jl.08  -  0.56  4-  0.84  -f  j0.48 
-  j0.59  4-  j0.88  -  0.51  =  -0.10  4-  j0.99  =  w11 


E75  =  1  -  1.5  +  j0.87  -  1.25  +  1.88  -  jl.08  -  0.56  +  0.84  -  j0.48 
-  jO.59  4-  jO.88  4-  0.51  =  0.92  -  j0.41  =  u;1 


E76  =  1  -  1.25  4-  j0.36  4-  0.56  -  jO.59  =  0.31  4-  j0.95  =  w12 


E77  =  1  -  1.5  -  j0.87  -  1.25  +  1.88  +  jl.08  4-  0.56  -  0.84  -  j0.48 

4-  j0.36  -  j0.54  4-  0.31  4-  jO.59  -  jO.88  4-  0.51  =  0.67  -  j0.74 


E7j=  1-  1.5+  j0.87  -  1.25  4-  1.88  -  jl.08  -+  0.56  -  0.84  4-  j0.48 

4-  j0.36  -  j0.54  -  0.31  +  jO.59  -  jO.88  -  0.51  =  -0.98  -  j0.21  =  J 


E7g  =  l  -  1.25  -  0.56  -  jO.36  -  jO.59  =  0.31  -  j0.95  =  J 


E?  1Q  =  1  -  1.5  -  j0.87  -  1.25  -4  1.88  4-  jl.08  4-  0.56  -  0.84  -  j0.48 

-  jO.36  4-  j0.54  -  0.31  -  jO.59  4-  jO.88  -  0.51  =  -0.98  -  j0.21  = 


E7  u  =  1  -  1.5  -  j0.87  -  1.25  4-  1.88  -  jl.08  -  0.56  -  0.84  -  j0.48 

-  jO.36  -  j0.54  +  0.31  -  jO.59  -  jO.88  -r  0.51  =  0.67  -  j0.74  =  ^‘3 


E-  12  =  l  -  1.25  -  0.56  -  jO.59  =  -0.81  ~jO  59  =  J 


E„  =  1  -  1.5  -  j0.87  -  1.25  -  1.88  -  jl.08  -  0.56  -  0.84  -  jO.  18 


-  jO.59  -  jO.88  -  0.51  -  0.92  -  j0.41  =  ~14 


E.  =  l  -  1.5  -  j0.87  -  1.25  -  1.88  -  jl.08  -  0.56  -  0.84  -  jO  18 


-  jO.59  -  jO.88  -  0.51  =  -0.10  -  j0.99  =  J 


E  =  1  =  J’ 


aA  ■..< tJl  d-1  *2 '  : 


/■?«•.  V 


saw 


• 


LW. 


1  -  1.5  ^  j0.87  =  -0.5  -4-  jO.87  =  ujiu 

1  -  1.5  -  j0.87  =  -0.5  -  jO.87  =  u;s 

l  -  1.25  -  0.56  -  jO.59  =  -0.81  -  j0.59  =  w6 

1  -  1.5  -4  j0.87  -  1.25  -  1.88  -  jl.08  -  0.56  4-  0.84  -  j0.48 
-  jO.59  4-  j0.88  0.51  =  0.92  -  j0.41  =  w1 


1  -  1.5  -  j0.87  -  1.25  -r  1.88  -  jl.08  -  0.56  -  0.84  4-  j0.48 
-  jO.59  -4  j0.88  -  0.51  =  -0.10  +  jO.99  =  ojn 

1  -  1.25  4-  0.56  4-  j0.36  4-  jO.59  =  0.31  4-  j0.95  =  u>12 


1  -  1.5  +•  j0.87  -  1.25  4-  1.88  -  jl.08  4-  0.56  -  0.84  4-  j0.48 
4-  jO.36  -  j0.54  -  0.31  4-  jO.59  -  j0.88  -  0.51  =  -0.98  -  j0.2l 

1  -  1.5  -  jO.87  -  1.25  -r  1.88  -4  jl.08  4-  0.56  -  0.84  -  j0.48 
4-  j0.36  -  j0.54  -  0.31  4-  jO.59  -  j0.88  -  0.51  =  0.67  -  j0.74 

l  -  1.25  4-  0.56  -  j0.36  -  jO.59  =  0.31  -  j0.95  =  w3 

1  -  1.5  -  jO.87  -  1.25  4-  1.88  -  jl.08  -  0.56  -  0.84  -4  j0.48 

-  jO.36  -  j0.54  -  0.31  -  jO.59  j0.88  4-  0.51  =  0.67  -  jO.T 

1  -  1.5  -  jO.87  -  1.25  -  1.88  -  jl.08  -  0,56  -  0.84  -  j0.48 

-  jO.36  -  j0.54  -  0.31  -  jO.59  -  j0.88  -  0,51  =  -0.98  -  j0.21 

=  1  -  1.25  -  0,56  -  jO.59  =  -0.81  -  jO.59  =  J 

1  -  1,5  -  jO.87  -  1.25  -  1.88  -  jl.08  -  0,56  -  0.84  -  jO.48 

-  jO.59  -  j0.88  -  0,51  =  -0.10  -  jO.99  =  ^ 


I  -  1,5  -  jO.87  -  1.25  -  1.88  -  j  1.08  -  0,56  -  0.84  *  jO.  18 
-  jO.59  -  jO.88  -  0,51  =  -0.92  -  jO.  ll  = 


Eg  4  =  Eg  5  =  1  -  1.25  -  0.56  +  j0.59  -  -0.81  4-  j0.59  =  w9 
Eg,  =  E98  =  1  -  1.25  4-  0.56  +  j  0.36  -  j0.59  =  0.31  -  j0.95  =  J3 
Eg  io  =  Eg  u  =  1  *  1-25  4-  0.56  -  j  0.36  4-  j0.59  =  0.31  4-  j0.95 
:  Eg  13  =  Eg  H  =  1  '  1-25  -  0.56  -  jO.59  =  -0.81  -  j0.59  =  oj6 
:l=u° 

1  -  1.5  -  j0.87  =  -0.5  -  j0.87  =  u/s 
:  1  -  1.5  4-  jO.87  =  -0.5  4-  j0.87  =  cj10 
1  -  1.25  -  0.56  +  jO.59  =  -0.81  4- jO.59  =  J 

1  -  1.5  -  jO.87  -  1.25  -r  1.88  4-  jl.08  -  0.56  4-  0.84  4-  j0.48 

-  jO.59  -  j0.88  4-  0.51  =  0.92  4-  j0.41  =  u;14 

1  -  1.5  4-  j0.87  -  1.25  1.88  -  jl.08  -  0.56  -  0.84  -  j0.48 

-  jO.59  -  j0.88  -  0.51  =  -0.10  -  jO.99  =  u;4 

l  -  1.25  —  0.56  -  jO.36  -  jO.59  =  0.31  -  j0.95  =  J 

1  -  1.5  -  jO.87  -  1.25  -  1.88  -  jl.08  -  0.56  -  0.84  -  j0.48 

-  jO.36  -  j0.54  -  0.31  -  jO.59  *  j0.88  -  0,51  =  -0.98  -  j0.21  =  u.8 

1  -  1,5  -  jO.87  -  1.25  -  1.88  -  jl.08  -  0,56  -  0.84  -  jO.  18 

-  jO.36  -  jO,54  -  0,31  -  jO,59  -  j0.88  -  0,51  =  0  67  -  jO.7  1  =  „13 

1  -  1 .25  -  0,56  -  jO.36  -  jO,59  =  0  31  -  j()  95  =  J- 


1  -  1,5  -  jO.87  -  1.25  *  1.88  —  jl.08  -  0,56  -  0  SI  -  j(),18 
-  jO.36  -  jO,5  1  -  0.31  -  jO,59  -  j0.88  -  0,51  0  67  -  jO.7  l  = 


1  -  1.5  -r  j0.87  -  1.25  t  1.88  -  jl.08  -r  0.56  -  0.84  -  j0.48 
t-  jO.36  -  j0.54  -  0.31  4-  j0.59  -  jO.88  -  0.51  =  -0.98  -  j0.21 

=  1  -  1.25  -  0.56  -  j0.59  =  -0.81  -  j0.59  =  <J 

1-1.5-  j0.87  -  1.25  -I-  1.88  -I-  jl.08  -  0.56  +  0.84  -h  j0.48 

-  j0.59  4-  j0.88  -  0.51  =  -0.10  +  j0.99  =  u in 

1  -  1.5  +  j0.87  -  1.25  4-  1.88  -  jl.08  -  0.56  -r  0.84  -  j0.48 

-  j0.59  +  j0.88  +  0.51  =  0.92  -  j0.41  =  cj1 

=  1  =  w° 

=  1  -  1.5  -  j0.87  =  -0.5  4-  j0.87  =  cj10 
1  -  1.5  -  j0.87  =  -0.5  -  j0.87  =  J 
1  -  1.25  -  0.56  4-  j0.59  =  -0.81  -  j0.59  =  ^ 

1  -  1.5  4-  j0.87  -  1.25  -  1.88  -  jl.08  -  0.56  -  0.84  -  j0.48 

-  j0.59  -  jO.88  -  0.51  =  -0.10  -  j0.99  =  w4 

1  -  1.5  -  j0.87  -  1.25  -  1.88  -  jl.08  -  0.56  -  0.84  -  j0.48 

-  j0.59  -  jO.88  -  0.51  =  0.92  -  j0  41  =  ~14 

l  -  1.25  -  0.56  -  jO.36  -  j0.59  =  0.31  -  j0.95  =  J 

l  -  1.5  -  j0.87  -  1.25  -  1.88  -  jl.08  -  0.56  -  0.84  -  jO.  18 

-  jO.36  -  j0.54  -  0.31  -  j0.59  -  jO.88  -  0.51  =  0  67  -  jO 

1  -  1.5  -  j0.87  -  1.25  -  1.88  -  jl.08  -  0.56  -  0.84  -  jO  48 

-  jO.36  -  j0.54  -  0.31  -  jO.59  -  jO.88  -  0.51  =  -0  98  -  jO.2 

-  1  -  1.25  -  0.56  *  jO  3(5  -  jO  59  =  0.31  -  j().95  --=■  J' 
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ARCHITECTURE  AND  NUMERICAL  ACCURACY  OF  HIGH-SPEED  DFT 
(DISCRETE  FOURIER  T.  .  <U>  AIR  FORCE  INST  OF  TECH 
HRI GHT-PATTERSON  AFB  OH  SCHOOL  OF  ENQI. .  K  TAVLOR 
DEC  85  AFIT/GE/ENG/85D-4?  F/G  9/1 


1  -  1.5  -  j0.87  -  1.25  -  1.88  -  j  1.08  ^  0.56  -  0.84  -t-  j0.48 

-  j0.36  -  j0.54  -  0.31  -  jO.59  -  jO.88  -  0.51  =  -0.98  -  j0.21  =  J 

1  -  1.5  -  jO.87  -  1.25  4-  1.88  4-  jl.08  +  0.56  -  0.84  -  j0.48 
4-  j0.36  -  j0.54  -r  0.31  +  j0.59  -  jO.88  4-  0.51  =  0.67  -  j0.74  =  J1 

=  1  -  1.25  -  0.56  -  jO.59  =  -0.81  -  j0.59  =  w8 

1  -  1.5  -  j0.87  -  1.25  +  1.88  -  jl.08  -  0.56  -  0.84  -  j0.48 

-  jO.59  -  jO.88  +  0.51  =  0.92  -  j0.41  =  J 

1  -  1.5  -  j0.87  -  1.25  4-  1.88  -  jl.08  4-  0.56  -  0.84  4-  j0.48 

-  jO.59  •+■  jO.88  -  0.51  =  -0.10  4-  jO.99  =  ojn 


Ejo  j  —  E[2  2  —  1  —  u;° 

E12  4  =  E125  =  1  -  1.25  -*-  j  1 .53  +  0.56  -  jO.59  =  0.31  -  jO.95  =  ^12 

Ei2,7  =  Ei2.8  =  1  -  1  25  -  0.56  4-  jO.59  =  -0.81  -r  jO.59  =  J 

Ei2.io  =  Ei2.u  =  1  -  1  25  -  0.56  -  jO.59  =  -0.81  -  j  0.59  =  J5 

=  E 13  =  Ei2H  =  *  *  *  25  -  j  1 .53  -  0.56  —  jO.59  =  0.31  -  jO  95  =  ^-3 

1  —  0 
i  -  <A> 


•  vv 
,s,  v  *i 
► 


- 1 


....  J 


l  -  1.5  -  j0.87  =  -0.5  -  j0.87  =  ^ 

1  -  1.5  -  jO.87  =  -0.5  -  j0.S7  =  J° 

1  -  1.25  -  j  1 .53  -  0.56  -  jO.59  =  0.31  -  jO.95  =  J- 

1  -  1.5  -  jO.87  -  1.25  -  1.88  -  jl.08  -  jl  53  -  j2.31  *  1.33 
-  0.56  -  0.84  -  jO.  18  -  jO  59  -  jO  88  -  0,51  =  0  67  -  j0.7  1 


1-15-  jO.87  -  1.25  -  1.88  -  jl.08  -  jl.53  -  j2.31  -  1  33 
-  0,56  -  0.8-1  -  jO  18  -  j0  59  -  jO.88  -  0  51  =  -0.98  -  jO  21 


1  -  1.25  -  0.56  -  j0.59  =  -0.81  -  j0.59  =  ua 


PL'  ■„*  •  W_ 


E,. ,  =  1  -  15  -  j0.87  -  1.25  -  1.88  —  jl.08  —  jl.53  -  j2.31  -  1.33 

1*1.5 

-  0.56  -  0.84  -  j0.48  -  j0.59  -  j0.88  -  0.51  =  0.67  -  j0.74  =  J1 
E148  =  1  -  1.25  -  0.56  -r  j0.59  =  -0.81  4-  j0.59  = 

Eh7  =  1  -  1.5  4-  j0.87  -  1.25  4-  1.88  -  jl.08  4-  0.56  -  0.84  4-  jO.48 

-  J0.59  -  j0.88  -  0.51  =  -0.10  -  jO.99  =  ui 4 

E. .  a  =  1  -  1.5  -  j0.87  -  1.25  -  1.88  -  jl.08  -  0.56  -  0.84  -  jO.48 

-  jO.59  -  j0.88  -  0.51  =  0.92  *  j0.41  =  u;14 

E149  =  1  -  1.25  -  0.56  -  jO.59  =  -0.81  -  jO.59  =  J 

E14  io  =  1  -  1.5  4-  j0.87  -  1.25  -  1.88  -  jl.08  -  0.56  -  0.84  j0.48 

-  jO.59  -  j0.88  4  0.51  =  0.92  -  j0.41  =  a/1 

Eu  n  =  l  -  1.5  -  j0.87  -  1.25  -  1.88  t-  jl.08  -  0.56  -  0.84  -  j0.48 

-  jO.59  -  j0.88  -  0.51  =  -0.10  -  jO.99  =  Jl 

i 

E„  12  =  1  -  1.25  -  jl.53  -  0.56  -  jO.59  =  0.31  -  jO.95  =  J 

E, ,  =  l  -  1.5  -  j0.87  -  1.25  -  1.88  -  jl.08  -  jl.53  -  j2.31  *  1.33 

-  0.56  -  0.84  -  j0.48  *  j0.59  -  j0.88  -  0.51  =  0  67  -  jO.74  =  J3 

E,4 ,4  =  1  -  1.5  -  jO.87  -  1.25  -  1.88  *  jl.08  -  jl.53  -  j2.31  -  1.33 

-  0.56  -  0.84  -  jO.48  -  jO.59  -  j0.88  -  0.51  =  -0.98  -  j0.21  =  J 


A 


t _ . 

* 


1  •  - 


„  i 


Figure  A- 11  -shows  the  product  of  'he  pre-addition.  multiplicative,  and  post¬ 
addition  matrices,  substituting  the  terms  for  the  E  elements  shown  in  Figure  A- 10. 


The  matrix  shown  in  Figure  A- 11  is  identical  to  the  matrix  shown  in  Figure  A-3. 
Thus,  the  Kronecker  product  of  the  5-point  and  3-point  Winograd  modules  does  give  a 
15-point  DFT. 


A  *■ 


Appendix  B 

Simulation  Program  Listings 


The  following  program  listings  indicate  the  code  required  for  the  simulation  and 
standard  modules  of  the  simulation  programs.  The  standard  modules  are  the  wino  files; 
these  files  compute  Winograd  algorithms  using  double-precision  arithmetic.  The  simula¬ 
tion  modules  are  the  sim  files;  theses  files  compute  Winograd  algorithms  using  integer 
arithmetic.  Only  the  siml6.c  listing  is  shown;  the  siml5.c  and  simlT.c  listings  are  identi¬ 
cal  to  their  wino  counterparts  except  for  the  multiplication  of  the  pre-addition  results 
with  the  coefficients  and  the  coefficients  themselves.  The  manner  in  which  multiplication 
is  done  is  illustrated  in  the  siml6.c  file.  The  coefficients  for  the  15-  and  17-point  simula¬ 
tion  modules  are  given  following  the  siml6.c  listing.  The  multiply .c  listing  shows  the 
special  multiply  routine  used  by  the  simulation  module;  this  routine  accepts  a  28-bit 
input  (data)  and  a  32-bit  input  (coefficient)  and  returns  a  32-bit  result.  The  diff  pro¬ 
grams  show  how  the  standard  and  simulation  modules  are  combined  to  give  comparison 
results  from  a  single  program.  The  stddiff  programs  show  how  the  standard  modules  are 
combined  with  the  direct  DFT. 


B.l  SIM16.C 


************************************************************************ 

* 

*  Module:  simld.c 

* 

*  Author:  Kent  Taylor 

* 

*  Date:  13  September  1985 

* 

*  Purpose:  To  perform  a  16-point  DFT  on  complex  input  data. 

* 

*  Inputs:  x,  y,  h 

* 

*  x  is  the  array  of  real  input  values 

*  y  is  the  array  of  imaginary  input  data 

*  h  is  the  index  array 

* 

*  Outputs:  x,  y 

* 

************************************************************************  / 


#define 

CSI60 

67108864 

#define 

CS161 

47453133 

#define 

CS162 

25681450 

#define 

CS163 

87681956 

^define 

CS164 

36319055 

#define 

CSI65 

62000506 

siml6  (x,  y.  h) 
long  xij,  y |] ; 
int  hi]: 

{ 

long  rlOO.  rlOl.  rl02,  rl03.  rl04,  rl05.  r!06.  rl07.  rl08.  r  100: 
long  rllO.  rill.  rll2,  rll3.  rl  14.  rl  15; 

long  r200.  r201.  r202,  r203,  r204.  r205,  r206.  r207.  r208.  r209; 
long  r210,  r211.  r212.  r213; 

long  r300.  r301,  r302.  r303.  r304.  r305.  r306.  r307; 
long  r400.  r401; 

long  t02,  t03,  t04.  t05,  t06,  (07.  t08.  t09: 

long  tlO.  til.  1 12.  1 13.  tl4.  1 1 5.  1 16.  t i 7 .  1 18.  1 10: 

long  tlOO.  1 101 .  U02.  1 103.  tl04,  1 105.  t.106.  1 107,  r  108.  1 100.  r  1 10.  till: 
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long  t200.  t201,  t202,  t203,  t204.  t205.  t206.  t207; 

long  slOO,  slOl,  sl02,  sl03,  sl04,  sl05,  sl06,  sl07.  sl08,  sl09; 

long  sllO,  sill,  sll2,  sll3,  sll4,  sll5; 

long  s200,  s201,  s202,  s203,  s204,  s205,  s206,  s207,  s208,  s209; 

long  s210,  s211,  s212,  s213; 

long  s300,  s301,  s302,  s303,  s304,  s305,  s306,  s307; 

long  s400,  s401; 

long  uOO,  uOl,  u02,  u03,  u04,  u05,  u06,  u07,  u08,  u09; 
long  ulO,  ull,  ul2,  ul3,  ul4,  ul5,  ul6,  u!7,  ul8,  ul9; 

long  ulOO,  ulOl,  ul02,  ul03,  u!04,  ul05,  ul06,  ul07,  ul08.  ul09.  ullO.  ulll; 
long  u200,  u201,  u202,  u203,  u204,  u205,  u206,  u207; 

****** ****************************************************************** 

K 

‘  r  variables  are  the  real  pre-add  equations 

‘  s  variables  are  the  imaginary  pre-add  equations 


‘•••••♦•••••••••a******************************************************* , 


xfhfO]] 

x[h[l)J 

x[h[2]] 

x[hi3j] 
xihl3]] 
x[h[4|] 
xihl4]] 
xjh[5]j 
xjhi5l] 
xihi’6j] 
xihi6l] 
x,hi7jj 
xthjTl] 
yihjOl] 
yihfOij 
vh  1 1] 
yihilil 


+  x(h[8]]; 

-  xfh[8]]; 

+  x[h[9l); 

-  x(h[9]J; 

+  x[h[lO)]; 

-  x(hflOj]; 
+  xlhlllj]; 

-  x[h[ll)]; 
-i-  xihfl2l[; 

-  x  f  h  [  1 2 !  ] ; 

-  xlh[13]j; 

-  x  [  h  [  1 3  j  ] ; 

-  xlh  |14jj ; 

-  xjhil4]]: 

*  x}hi'15l|; 

-  x  i  h  ( 1 5 !  ] ; 

-  y  [h[8i] ; 

-  yih[8?J; 

*  y[hi9i] ; 

-  yjhi'911; 


BE 


=  yih[3l]  -  yfhflll]; 

—  y[h[4]]  -  y [h[l2j] 
=  y(h[4]]  -  y[h|l2]]; 
=  y[m  +  y[h[i3j] 

-  y im  -  y[h[l3]]; 
=  yfbfsj)  +  y[h[l4j] 
=  yfhfe]]  -  y[h[14]]; 
=  Yim  +  y[h[l5]] 
=  yth[7]]  -  y[h[l5]]; 
=  rlOO  -r  rl08; 

=  rlOO  -  rl08; 

=  rll2  +  rl04; 

=  rll2  -  rl04; 

=  rl02  rllO; 

=  rl02  -  rllO; 

=  rl06  +  rll4; 

=  rl06  -  r!14: 


rl03  -  rl  15; 
rill  +  rl07 
rill  -  rl07; 
rl05  f  rll3 
rl05  -  rll3: 


slOO  -  sl08: 
sll2  -  sl04 
sll2  -  sl04: 
sl02  +  si  10 
sl02  -  si  10; 
sl06  4  sll4 
sl06  -  si  14; 


sl03  -  silo 


sill  -  sl07 


3  =  s 


r200  -  r202 


r303  =  r-206  -  r204: 
r304  =  r205  -  r207: 
r305  =  r205  -  r207: 
r306  =  r209  -r  r211; 
r307  =  r208  -  r210; 
s300  =  s200  4-  s202; 
s301  =  s200  -  s202; 
s302  =  s206  +  s204; 
s303  —  s206  -  s204; 
s304  =  s205  -r  s207; 
s305  =  s205  -  s207; 
s306  =  s209  +  s211; 
s307  =  s208  4-  s210; 
r400  =  r300  +  r302; 
r401  =  r300  -  r302; 
s400  =  s300  +  s302; 
s401  =  s300  -  s302: 
xihiOi]  =  mult  (r400,  C160); 
xih(8j]  =  mult  (r401,  060); 
yi’hfOi j  =  mult  (s400,  C160); 
y ; h 1 8 j  1  =  mult  (s401,  C160); 

********************************************* ********* ******** ********** 

After  the  pre-adds,  the  sums  are  multiplied  by  the  proper 
coefficients. 

*********************************************************************** 

t02  =  mult  (r301,  060); 
t03  =  mult  (r20l,  060); 
t04  —  mult  (rlOl,  060); 
t05  =  mult  (r305.  061); 
t()6  =  mult  (r213.  061); 
r07  =  mult  (r306.  062); 
t08  =  mult  (r209.  063); 

(09  =  -(mult  ( r2 1 1 .  064)); 
r  10  =  mult  (r303.  060); 
til  =  mult  (r203.  060); 
t .12  =  -( mult  ( rl09.  060)); 
tl3  —  -(mult  (r304.  0611); 
til  =  -( mult  ( r2 1 2 .  061 )); 
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1 1 5  =  -(mult  (r307,  C165)); 
tl6  =  mult  (r208.  C164); 
tl7  =  -(mult  (r210,  C163)); 
u02  =  mult  (s301,  C160); 
u03  =  mult  (s201,  C160); 
u04  ==  mult  (slOl,  C160); 
u05  =  mult  (s305,  C161); 
u06  =  mult  (s213,  C161); 
u07  =  mult  (s306,  C162); 
u08  =  mult  (s209,  C163); 
u09  =  -(mult  (s211,  C164)); 
ulO  =  mult  (s303,  C160); 
ull  =  mult  (s203,  C160); 
ul2  =  -(mult  (sl09,  C160)); 
ul3  =  -(mult  (s304,  C161)); 
ul4  =  -(mult  (s212,  C161)); 
ul5  =  -(mult  (s307.  C165)); 
ul6  =  mult  (s208,  C164); 
ul7  =  -(mult  (s2l0,  Cl63)); 

************************************************************************* 

t  variables  are  the  real  post- add  equations 

u  variables  are  the  imaginary  post-add  equations 


y  h *  1 5 i j  =  u200  -  t204; 
y ' h:2j]  =  ulOO  -  tl02; 
yihi  14]j  =  ulOO  -  tl02; 
y[hl[3i]  =  u203  -  t207; 
y  [hi  13]]  =  u203  +  t207; 
y{h[4]]  =  u02  -t-  tlO; 
y(h[l2]]  =  u02  -  tlO; 
y  [h[5j]  =  u202  +  t206; 

y[h[llH  =  u202  -  1206; 

y(h[6]]  =  ulOl  4-  tl03; 
yjh(10l]  =  ulOl  -  tl03; 
y[h[7]j  =  u201  -  t205; 
y[h[9]j  =  u201  -r  1205; 
return; 


SIM15.C  Coefficients 


CS1500  33554432 

CS1501  -50331648 

CS1502  -29058990 

CS1503  -41943040 

CS1504  62914560 

CS1505  36323738 

CS1506  -51634961 

CS1507  77452442 

CS1508  44717188 

CS1509  18757497 

CS1510  -28136246 

CS1511  -16244469 

CS1512  12189360 

CS1513  -18284041 

CS1514  -10556296 

CS1515  19722800 

CS1516  -29584200 

CS1517  -17080446 

SIM17.C  Coefficients 

CS1700  16777216 

CS1701  714757 

CS1702  ;  138987 

CS1703  17535269 

CS1704  29604821 

CS 1705  -12136771 

CS1706  -1494104 

CS1707  -17825792 

CS1708  4323389 

CS1709  13082916 

CS1710  9125013 

CS 1711  7018140 

CS1712  21493173 

CS1713  7396891 

CS1714  5321333 

CS1715  -15122700 

CS1716  -7255937 

CS1717  11189318 

CS1718  -10131593 

CS1719  -6194965 

CS1720  8163279 

< 'Si 721  3995277 

CS1722  -26128535 

CS1723  11066628 

CS 172 1  -2401987 

CS1725  4010336 

CS1726  -804174 

CS1727  -38903033 

CS1728  13239667 

CS1729  64566399 

CS 1730  -21816763 

CS1731  68475819 

CS1732  -21842292 

CS 1733  -223681 

CS 1734  -6231020 

CS 1 735  3227351 
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B.2  MULT.C 


/  ************************************************************************ 


Module:  mult.c 


Author:  Kent  Taylor 


Date:  29  October  1985 


Purpose:  To  perform  a  multiplication  of  two  long  (32-bit) 
integers,  returning  a  32-bit  result.  The  algo¬ 
rithm  used  in  this  program  came  from  ”The  Design 
and  Analysis  of  Computer  Algorithms” ,  Aho,  Hop- 
croft,  and  Ullman,  Addison  Wesley  Publishing  Co., 
Reading  MA,  1974,  pp  62-64). 


Inputs:  a,  b 


a  is  the  32-bit  input  data  word 
b  is  the  28-bit  coefficient 


Outputs:  product 


The  product  x*y  can  be  represented  by  splitting  x  and  y  into 
left  and  right  halves,  then  combining  products  and  sums  of  the 
halves  to  obtain  an  n-bit  product  from  the  multiplication  of 
two  n-bit  numbers.  In  Aho.  et  al.,  the  left  and  right  halves 
of  x  and  y  are  a.  b,  c,  and  d,  respectively.  The  product  is 
found  as  follows: 


a  b 
c  d 


ad  —  bd 
ac  —  be 

klac  —  k2(ad  -  be)  —  k3bd 


where  kl  =  2**n.  k2  =  2**(n  2),  and  k3  =  1 
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*  Noting  that:  (a+b)(c+d)  =  ac  +  ad  +  be  4-  bd 

*  one  can  write  the  middle  term  as:  (a+b)(c+d)  -  ac  -  bd 

* 

*  The  sums  (a+b)  and  (c+d)  may  overflow  the  (n/2)-bit  repre- 

*  sentation;  thus,  the  sums  are  rewritten  using  a  16-bit 

*  number  to  represent  the  sum  and  a  one-bit  number  to  repre- 

*  sent  the  overflow  (if  any).  The  final  representation  in 

*  Aho,  et  al.  for  the  sums  is  (a+b)  =  al  +  bl  where  al  is  the 

*  overflow  bit  and  bl  is  the  remaining  16  bits  of  the  sum 

*  (a  like  expression  is  obtained  for  the  (c+d)  sum). 

* 

************************************************************************ j 

#define  RMASK  0177777 

#define  RDMASK  0100000000 

#define  ROUT  077 

#define  OVER  0200000 

#define  MSB  020000000000 

#define  LSB  01 

#define  TWO  16  65536 

long  mult  (a,  b) 
long  a,  b; 

{ 

long  product; 

unsigned  long  subprodl,  subprod2,  prodl,  prod2,  prod3,  prod4: 
unsigned  long  Iword,  rword,  upperl6a,  lowerl6a,  upperl6b,  lowerlGb; 
unsigned  long  sumx,  sumy,  leftx,  rightx,  lefty,  righty,  xbit,  ybit; 
unsigned  long  round,  over2.  over3.  signx,  signy; 

************************************************************************ 

* 

*  Initialize  the  variables. 

* 

a*********************************************************************** 

sumx  =  sumy  =  xbit  =  ybit  =  signx  =  signy  =  round  =  over2  =  0; 

over3  =  Iword  =  rword  =  upperl6a  =  lowerl6a  =-  0; 

upperlGb  —  lowerl6b  =  0; 

prodl  -  prod2  —  prod3  =  prod4  —  0; 

product  =  0; 

************************************************************************ 

* 

*  Change  the  inputs  to  unsigned  numbers  since  the  algorithm 
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in  Aho.  et  al.  assumes  positive  numbers 


A************************ ***********************************************/ 

if  (a  <  0) 

{ 

a  =  -a; 
signx  =  1; 

} 

if  (b  <  0) 

{ 

b  =  -b; 
signy  =  1; 

} 

************************************************************************ 

* 

*  Find  the  leftmost  and  rightmost  16  bits  of  each  number. 

* 

*  leftx  =  a  rightx  =  b  lefty  =  c  righty  —  d 

* 

******* **************************************************** *************/ 

/ 

leftx  =  a  >  >  16; 
rightx  =  a  &  RMASK; 
lefty  =  b  >  >  16: 
nghty  =  b  &  RMASK; 

* 

*  Form  the  first  sub-product. 

* 

************************************************************************ 

prodl  =  leftx  *  lefty; 

*******************»'**************************************************« 

« 

*  Form  the  second  sub-product. 

* 

************************************************************************ 

prod2  =  rightx  *  righty; 

************************************************************************ 

* 

*  Form  the  third  sub-product. 


ITT 


■rr^r 


^vvV.  -P. V.*"  vv 


i 

o 

%  * 

r2 

t. 

a 


*  xbit  =  al  ybit  —  cl  sumx  =  (a+b)  sumy  =  (c-rd) 

* 

************************************************************** **********/ 
sumx  =  leftx  4-  rightx; 
sumy  =  lefty  +■  righty; 
if  (sumx  >  =  TWO  16) 

{ 

xbit  =  1; 

sumx  =  sumx  -  TWO  16; 

} 

if  (sumy  >=  TW016) 

{ 

ybit  =  1; 

sumy  =  sumy  -  TWOl6, 

} 

subprodl  =  (xbit  *  sumy)  +  (ybit  *  sumx); 
subprod2  =  sumx  *  sumy; 
if  (subprodl  >=  TW016) 

{ 

over2  =  1; 

subprodl  —  subprodl  -  TWOl6; 

} 

prod3  =  (subprodl  <  <  16)  -  prodl  -+-  subprod2  -  prod2; 

********************************  ******  *******  ************  ******  ********* 

« 

*  The  final  product  is  a  64-bit  result,  composed  of  two  32-bit 

*  words  (Iword  and  rword).  Iword  is  the  sum  of  prodl,  the 

*  most  significant  16  bit  of  prod3  (upperl6),  and  the  overflow 

*  bits  from  prod3.  rword  is  the  sum  of  prod2  and  the  least 

*  significant  16  bits  of  prod3  (lower!6). 


upperl6a  =  (prod3  >  >  16)  &  RMASK; 
lower!6a  =  (prod3  &  RMASK)  <  <  16; 


* 

*  Find  rword  first,  since  there  may  be  an  overflow. 


rword  =  (prod2  >  >  l)  *  (lower!6a  >  >  l); 
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if  ((rword  &  MSB)  !=  0) 

{ 

rword  =  (rword  <  <  1)  +  (prod2  &  LSB)  +  (lowerl6a  &  LSB); 
over3  =  1; 

} 

else  rword  =  prod2  +  lowerl6a; 

lword  —  prodl  4-  upperl6a  +  over3  +  (((xbit  &  ybit)  +  over2)  <  < 
round  =  (RDMASK  &  rword); 

product  =  (lword  <  <  6)  +  (ROUT  &  (rword  >  >  26)); 
if  (round  !=  0)  product  -r-r; 
if  ((signx  ‘  signy)  ==  1)  product  =  -product; 
return  (product); 

} 


B.3  WIN015.C 


******************* **************** **************************** ** ****** * 


Module:  winol5.c 
Author.  Kent  Taylor 
Date:  11  September  1985 

Purpose:  To  compute  a  15-point  DFT  on  complex  input  data 
Inputs:  x,  y,  h 

x  is  the  array  of  real  input  data 
y  is  the  array  of  imaginary  input  data 
h  is  the  index  array 

Outputs:  x,  y 


************************************************************************  / 

#define 

C1501 

-1.50000000000000 

^define 

C1502 

-0.86602540378444 

#define 

C1503 

-1.25000000000000 

^define 

C1504 

1.87500000000000 

#define 

CI505 

1.08253175473055 

#define 

Cl  506 

-1.53884176858764 

^define 

C1507 

2.30826265288146 

^define 

Cl  508 

1.33267606400146 

^define 

Cl  509 

0.55901699437494 

#define 

C1510 

-0.83852549156241 

^define 

C1511 

-0.48412291827592 

#define 

C1512 

0.36327126400266 

^define 

C1513 

-0.54490689600399 

#define 

C1514 

-0.31460214309119 

^define 

C1515 

0.58778525229247 

^define 

C1516 

-0.88167787843870 

^define 

C1517 

-0.50903696045512 

stdl5  (x.  y.  h) 

double  Xi;.  y:j; 

double  rlOO,  rlOl,  rl02,  rl03,  rl04.  rl05.  rl06,  rl07,  rl08,  rl09; 
double  r200,  r20l,  r202,  r203,  r204,  r205,  r206,  r207,  r208,  r209; 
double  r210,  r211; 

double  r300,  r301,  r302,  r303,  r304,  r305,  r306,  r307,  r308,  r309; 
double  r310,  r311; 

double  r400,  r401,  r402,  r403,  r404,  r405,  r406,  r407,  r408; 
double  r500,  r501,  r502; 

double  tOO,  tOl,  t02,  t03,  t04,  t05,  t06,  t07,  t08,  t09; 
double  tlO,  til,  tl2,  tl3,  tl4,  tl5,  tl6,  tl7; 
double  tlOO,  tlOl,  tl02; 

double  t200,  t201,  t202,  t203,  t204,  t205,  t206,  t207,  t208,  t209; 
double  t2l0,  t211; 

double  t300,  t301,  t302,  t303,  t304,  t305,  t306,  t307,  t308,  t309; 

double  t3l0,  t311; 

double  1400,  t40l,  t402,  t403,  t404; 

double  t500,  t501,  t502,  t503,  t504,  t505,  t506,  t507,  t508,  t509; 
double  t510,  t511,  t512,  t513,  t514; 

double  slOO,  slOl,  sl02,  sl03,  sl04,  sl05,  sl06,  sl07,  sl08,  sl09: 
double  s200,  s201,  s202,  s203,  s204,  s205,  s206,  s 207,  s208,  s209; 
double  s210,  s211; 

double  s300,  s301,  s302,  s303,  s304,  s305,  s306,s307,  s308,  s30 9; 
double  s310,  s311; 

double  s400,  s401,  s402,  s403,  s404,  s405,  s406,  s407,  s408; 
double  s500,  s501,  s502; 

double  uOO,  uOl,  u02,  u03,  u04,  u05,  u06,  u07.  u08,  u09; 
double  ulO,  uii.  ul2,  ul3,  ul4,  ul5,  ul6,  ul7; 
double  ulOO,  ulOl,  ul02; 

double  u200,  u201,  u202.  u203,  u204,  u205.  u206,  u207,  u208.  u209: 
double  u210,  u211; 

double  u300,  u301.  u302,  u303,  u304.  u305.  u306.  u307.  u308.  u30fl. 

double  u310.  u311: 

double  u400.  u401,  u402,  u403,  u404; 

double  u500.  u501,  u502,  u503.  u504.  u505.  u506,  u507,  u508,  u509: 
double  u510,  uoll.  u512.  u513,  u514; 


Scramble  the  input  data  according  to  PFA  rules. 
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cOl  =  xih[5l]; 
t02  =  xihllOi]; 
t03  =  x(h(3lj; 
t04  =  x[h(8]]; 
t05  =  x[hjl3jj; 
t06  =  x(h[6]]; 
t07  =  x[h[ll]]; 
t08  =  x[h[l]]; 
t09  =  x[h[fl]]; 
tlO  ==  x[h[l4j] ; 
til  =  x[h[4j]; 
tl2  =  x[h[l2]j; 
tl3  =  x[h[2]]; 
tl4  =  x(h[7]j; 
u01  =  y[h[5)]; 
a02  =  y(h[l01J; 
u03  =  y[h(3j); 
u04  =  y(h(8j]; 
u05  =  y[h[l3]]; 
u06  =  y(h[6l]; 
u07  =  y[h[llj]; 
u08  —  y[h[l}]; 
u09  =  y[h(9lj; 
ulO  =  y[h[l4]]; 
ull  =  y[h(4j]; 
al2  =  yjh!l2!]; 
ul3  =  y  [h(2|j; 
u!4  =  yjh[7i]; 


r  variables  are  the  real  pre-add  equations 
s  variables  are  the  imaginary  pre-add  ■•quations 


*  rlOO.  siOO  eqns  are  3-point  pre-adds  * 

rlOO  =  tOl  +  t02: 

rlOl  =  tOl  -  t02; 

rl02  =  t04  -  t05; 

r!03  =  t04  -  t05: 
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rl05  =  t07  -  t08; 
rl06  =  tlO  -  til; 
rl07  =  tlO  -  til; 
rl08  =  tl3  +  tl4; 
rl09  =  tl3  -  tl4; 
slOO  =  u01  +  u02; 
slOl  =  uOl  -  u02; 
sl02  =  u04  +  u05; 
sl03  =  u04  -  u05; 
sl04  =  u07  4-  u08; 
sl05  =  u07  -  u08; 
sl06  =  ulO  4-  ull; 
sl07  =  ulO  -  ull; 
sl08  =  ul3  4-  ul4; 
sl09  =  ul3  -  ul4; 

/*  r200,  s200  eqns  are  3-point  pre-adds  */ 
r200  =  rlOO  4-  tOO; 
r201  =  rl02  +  t03; 
r202  =  rl04  +  t06; 
r203  =  rl06  +  t09; 
r204  =  rl08  +  tl2; 
s200  =  slOO  4-  uOO; 
s201  =  sl02  +  u03; 
s202  =  sl04  4-  u06; 
s203  =  sl06  *  u09; 
s204  =  sl08  ul2; 

*  r300,  s300  eqns  are  5-point  pre-adds  */ 

r300  =  r201  4-  r204; 

r301  =  r201  -  r204; 

r302  =  rl02  +  rl08; 

r303  =  rl02  -  rl08; 

r304  =  rl03  -  rl09: 

r305  =  r!03  -  rl09; 

r306  =  r203  *  r202; 

r307  -  r203  -  r202; 

r308  =  rl06  -  rl04: 

r309  =  rl06  -  rl04; 

r3l0  =  ri07  -  rl05; 

r311  =  rl07  -  rl05. 

s300  =  s201  -  s204 ; 


\\V  V, 
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************************************************************************ 


.After  the  pre-adds,  the  sums  are  multiplied  by  the  proper 
coefficients. 


************************************************************************ 


too 

=  r500; 

tOl 

=  C1501 

r501; 

t02 

=  C1502 

r502; 

t03 

=  C1503 

r400; 

t04 

=  Cl  504 

r402; 

t05 

=  C1505 

r404; 

t06 

=  C1506 

r301; 

t07 

=  Cl  507 

r303; 

t08 

=  C1508 

r305; 

t09 

=  Cl  509 

r401; 

tlO 

=  C1510 

r403; 

til 

=  C1511 

r405; 

tl2 

=  C1512 

r307; 

tl3 

=  C1513 

r309; 

tl4 

=  C1514 

r311; 

tl5 

=  C1515 

r406; 

tl6 

-  C1516 

r407; 

tl7 

=  C1517 

r408; 

uOO 

—  s500; 

uOl 

=  C1501 

s501; 

u02 

=  C1502 

s502; 

u03 

=  Cl  503 

s400; 

u04 

=  Cl  504 

s402: 

u05 

=  Cl  505 

s404: 

u06 

=  C1506 

s301; 

u07 

=  Cl  507 

s303; 

11O8 

=  Cl  508 

s305; 

u09 

=  Cl  509 

s401: 

ulO 

=  0510 

s403; 

nil 

=  0511 

s405; 

ul2 

-  0512 

s307 ; 

ul3 

=  0513 

s309: 

ul  t 

=  0514 

s31 1 ; 

u!5 

=  0515 

s406: 
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ul6  =  C1516  *  s407; 
ul7  =  C1517  *  s408. 

************************************************************************ 


*  t  variables  are  the  real  post-add  equations 

‘  u  variables  are  the  imaginary  post-add  equations 

■•a*********************************************************************/ 

*  tlOO.  ulOO  eqns  are  5-point  post-adds  */ 
tlOO  =  tOO  -i-  t03; 

tlOl  =  tOl  +  t04; 
tl02  =  t02  +  t05, 
ulOO  =  uOO  +  u03; 
ulOl  =  uOl  +  u04; 
ul02  =  u02  4-  u05; 

*  t200,  u200  eqns  are  5-point  post-adds  */ 


t200 

=  tlOO 

-  t09; 

t201 

=  tlOO  - 

t09: 

t202 

O 
•— * 

«4> 

f 

o 

*-> 

II 

t203 

=  tlOl  - 

tlO: 

t204 

=  tl02 

-  til: 

t205 

=  t!02  - 

til; 

t206 

=  t06  -  tl5; 

1207 

=  t07  *  1 1 6 ; 

t208 

=  t08  - 

-  tl7; 

t209 

=  1 12  - 

-  tl5; 

t210 

=  tl3  1-  tl6: 

1211 

=  1 1 4  - 

-  tl7 ; 

u200 

=  ulOO 

—  u09: 

u201 

=  ulOO 

-  u09; 

u202 

=  ulOl 

-  ulO; 

u203 

=  ulOl 

-  ulO: 

u204 

=  ul02 

*  nil: 

u205 

=  ul02 

-  ull: 

u206 

=  u06 

—  ul5; 

—  ul6; 

u208 

-  ul7; 

=  u  12 

—  u  1 5: 

=  u  13 

-  ul6: 

u21 1 

=  u  14 

-  u  17: 

3E 


t300,  u300  eqns  are  5-point  post-adds 
t300  =  t200  -  u206; 
t301  =  t200  -  u206; 
t302  =  t202  -  u207; 
t303  =  t202  4-  u207; 
t304  =  t204  -  u208; 
t305  =  t204  4-  u208; 
t306  =  t201  -  u209; 
t307  =  t201  4-  u209; 
t308  =  t203  -  u210; 
t309  =  t203  -t-  u210; 
t3l0  =  t205  -  u211; 
t3ll  =  t205  4-  u211; 
u300  =  u200  4-  t206, 
u301  =  u200  -  t206; 
u302  =  u202  4-  t207; 
u303  =  u202  -  t207; 
u304  =  u204  -  t208; 
u305  =  u204  -  t208; 
u306  =  u201  -  t209; 
u307  =  u201  -  t209; 
u308  —  u203  -  t210; 
u309  =  u203  -  t210; 
u310  -  u205  -  t211; 
u311  =  u205  -  t211; 

'  t400.  u400  eqns  are  3-point  post-adds 

t400  =  tOO  -  tOl; 

t401  -  t300  -  t302; 

t402  =  t301  -  t303: 

t403  =  t306  -  t308; 

t404  =  t307  -  t309; 

u400  =  uOO  *  uOl: 

u401  =  u300  -  u302. 

n402  =  u301  -  u303; 

u403  =  u306  -  u308; 

j404  =  u307  -  u309: 


t500.  u500  eqns  are  3-point  post-adds 


:500  =  tOO; 

•501  =  t400  -  u02: 


B.4  WIN016.C 


♦♦I********************************************************************** 

* 

*  Module:  wind  6 

* 

*  Author:  Kent  Taylor 

* 

*  Date:  11  August  1985 

* 

*  Purpose:  To  perform  a  16-point  DFT  on  complex  input  data. 

* 

*  Inputs:  a,  b,  h 

* 

*  a  is  the  array  of  real  input  values 

*  b  is  the  array  of  imaginary  input  data 

*  h  is  the  index  array 

* 

*  Outputs:  a,  b 

* 

«***********************************************************************/ 


CI601 
C1602 
C1603 
Cl  604 
C1605 


0.70710678118654 

0.38268343236510 

1.30656296487638 

0.54119610014619 

0.92387953251128 


#define 
#define 
^define 
#define 
#define 
stdl6  (a.  b,  h) 
double  aij,  b[]; 
int  hi.; 


double  rlOO,  riOl.  rl02,  rl03.  rl04.  rl05.  rl06.  rl07.  rl08.  rlOO: 
double  rllO.  rill,  rll2.  rl  13.  rl  14.  rl  15: 

double  r200.  r201.  r202,  r203.  r204.  r205.  r206.  r207.  r20S.  r209: 
double  r210.  r211.  r2l2.  r213: 

double  r300,  r301.  r302,  r303.  r304,  r305,  r306.  r307; 

double  (02.  t03.  t04,  t05,  t06.  t07.  t08,  t09; 

double  tiO.  til.  t 12.  tl3.  1 1 4 .  tl5.  tl6,  tl7,  tl8.  tl9: 

double  tlOO.  tIOI.  t!02.  tI03,  t!04,  1105.  t!06.  CI07.  U08.  (109; 

double  (HO.  till: 

double  (200.  t201.  (.202,  t203,  1204.  (205.  (206.  (207; 


✓  /' 


double  slOO,  slOl,  sl02,  sl03,  sl04,  sl05,  sl06,  sl07.  sl08,  sl09: 
double  sllO,  sill.  sll2,  sll3,  sll4,  sll5; 

double  s200,  s201,  s202,  s203,  s204,  s205,  s206.  s207,  s208,  s209; 
double  s210,  s211,  s212,  s213; 

double  s300,  s301,  s302,  s303f  s304,  s305,  s306.  s307; 

double  uOO,  uOl,  u02,  u03,  u04,  u05,  u06,  u07,  u08,  u09; 

double  ulO,  ull,  ul2,  ul3,  ul4,  ul5,  ul6,  ul7,  ul8,  ul9; 

double  ulOO,  ulOl,  ul02,  ul03,  ul04,  ul05,  ul06,  ul07,  ul08,  ulQ9; 

double  ullO,  ulll; 

double  u200,  u201,  u202.  u203,  u204,  u205.  u206.  u207; 

************ ****** ************ ****** ************ ************************ 

r  variables  are  the  real  pre-add  equations 
s  variables  are  the  imaginary  pre-add  equations 


rlOO  =  ajh[0]j  +  ajh[8]]; 
rlOl  =  afhfOjJ  -  a(h[8j]; 
rl02  =  al’hflj]  +  a|h(9jj; 
rl03  —  ajhll]]  -  ajh[9]]; 
rl04  —  ajh[2]j  +  alh[lO]]; 
rl05  =  ajh[2j]  -  a  hilOj); 
rl06  =  aihl3]]  -  alhfll]]; 
rl07  =  alh(3l]  -  a(h[ll|]; 
r!08  =  aih(4j]  +  a(h(12!j; 
rl09  =  ajh(4lj  -  a|h[l2j]; 
rllO  =  aih[5jj  —  aih[l3)]; 
rill  =  aihlbj]  -  aihjl3j]; 
rl  12  =  a(h[6lj  —  aih[l4l]; 
r 1 13  =  a[hi6l]  -  ajhilll]; 
r!14  =  aihJI;  -  afhll5lj; 
rl  15  =  aihi7i!  -  aihilSll; 
slOO  =  bihjOlj  -  b  h  8 
sLOL  =  bihiOjj  -  blhi8ll; 


After  the  pre-adds,  the  sums  are  multiplied  bb  the  proper 
coefficients. 

a*************************************** ****** *************************  i 

t02  =  r301; 

t03  =  r201; 

i04  =  rlOl: 

t05  =  r305  *  C1601; 

t06  =  r213  *  C1601; 

t07  =  r306  *  C1602: 

t08  =  r209  *  C1603: 

t09  =  -(r21 1  *  0604); 

tlO  =  r303 : 

til  =  r203, 

tl2  =  -(r!09); 

1 1 3  =  -(r304  *  C1601); 

1 1 4  =  -(r212  *  C160I); 

1 1 5  =  -( r30"  *  C1605): 
tl6  =  r208  *  Cl 604; 

1 17  =  -(r210  *  C1603); 


u05  =  s305  *  C1601; 
u06  =  s213  *  C1601; 
u07  ==  s306  *  C1602; 
u08  =  s209  *  C1603; 
u09  =  -(s211  *  C1604); 
ulO  =  s303; 
ull  =  s203; 
ul2  =  -(sl09); 
ul3  =  -(s304  *  C1601); 
u!4  =  -(s212  *  C 1 60 1 ) ; 
ul5  =  -(s307  *  C1605); 
ul6  =  s208  *  C1604; 
ul7  =  -(s210  *  C1603); 


b  I  hi  121]  =  u02  -  tlO; 
b[h[5l]  =  u202  +  t206; 
b(h[ll]]  =  u202  -  t206; 
b[h(6j]  =  ulOl  +  tl03; 
b(h[10j]  =  ulOl  -  tl03; 
b[h[7j]  =  u201  -  t205; 
b(h(9j]  =  u201  +  t205; 
return; 


B.5  WIN017.C 


************************************************************************ 

Module:  winol7 
Author:  Kent  Taylor 
Date:  11  August  1985 

Purpose:  To  perforin  a  17-point  DFT  on  complex  input  data. 

Inputs:  x,  y,  h 

x  is  the  array  of  real  input  values 
y  is  the  array  of  imaginary  input  data 
h  is  the  index  array 

Outputs:  x,  y 

I***********************************************************************/ 


#define 

C1701 

-0.0426028491177360 

#define 

C1702 

0.2049796502326218 

#define 

C1703 

1.0451835201736758 

^define 

C1704 

1.7645848660222969 

#define 

C1705 

-0.7234079772860566 

#define 

C1706 

-0.0890555916206064 

#define 

C1707 

-1.0625000000000000 

#detine 

C1708 

0.2576941016011038 

#define 

0709 

0.7798026078948376 

^define 

0710 

0.5438931846457058 

#define 

0711 

0.4201019349705270 

^define 

0712 

1.2810929434228074 

#detine 

0713 

0.4408890734817534 

#define 

0714 

0.3171761928327251 

#define 

0715 

-0.9013831864801668 

#define 

0716 

-0.4324875636007231 

#define 

0717 

0.6669353750404450 

#detine 

0718 

-0.6038900431251697 

#define 

0719 

-0.3692487319858255 

#define 

C1720 

0.4865693875554976 

#define 

C1721 

0.2381371213676061 

#define 

C1722 

-1.5573820617422459 

#define 

C1723 

0.6596224701873199 

#define 

C1724 

-0.1431696156986624 

#define 

C1725 

0.2390346995986077 

#define 

C1726 

-0.0479325419499726 

#define 

C1727 

-2.3188014856550064 

#define 

C1728 

0.7891456841920625 

#define 

C1729 

3.8484572871179504 

#define 

C1730 

-1.3003804568801376 

#define 

C1731 

4.0814769046889033 

#define 

C1732 

-1.4807159909286282 

#define 

C1733 

-0.0133324703635514 

#define 

C1734 

-0.3713977869055763 

#define 

C1735 

0.1923651286345638 

stdl7  (x,  y,  h) 

double  xil,  y )]; 

int  hi;; 

f 

double  rlOO.  rlOl,  rl02,  r!03, 

rl04,  rl05,  rl06,  rl07,  rl08. 

rl09; 

double  rllO.  rill.  rll2, 

rll3, 

rll4,  rl  15; 

double  r200. 

r201.  r202, 

r203, 

r204,  r205,  r206,  r207 

r208. 

r209; 

double  r210. 

r2 1 1 ,  r212. 

r213, 

r214,  r215,  r216,  r217 

double  r300.  r301.  r302.  r303.  r304,  r305,  r306,  r307,  r308.  r309; 

double  r310.  r3ll.  r312, 

r313,  r314.  r315,  r3l6,  r317 

,  r318,  r319 

double  r320. 

r321.  r322. 

r323. 

r324.  r325,  r326.  r327 

r328. 

r329 

double  r330,  r331.  r332. 

r333. 

r334,  r335; 

double  tlOO.  tlOl.  tl02. 

tl03. 

tl04.  tl05,  tl06.  tl07 

.  tl08. 

tl09 

double  tl  10.  till.  tll2. 

tl  13. 

tl  14.  tl 15.  1 1 16.  tl  17 

.  1 118. 

tl  19 

double  tl20. 

1 1 2 1 .  1 122. 

t!23. 

1 124.  U25.  1 126,  1 1 27 

.  tl28. 

1 129 

double  1 130. 

1 1 3 1 .  tl32. 

1 133. 

tl34.  tl35: 

double  t200. 

1 20 1 .  t202. 

1203. 

t204.  t205 .  t206.  1207 

.  t208. 

t209 

double  1 2 1 0 . 

t211.  t212. 

t213. 

1214.  t215.  1 2 1 6 .  t2 17 

.  t218. 

t219 

double  t220. 

t221.  1222. 

t223. 

t224.  t225.  t226.  t227 

.  1228. 

t229 

double  t230. 

t23 1 .  t232. 

t233. 

t234.  t235.  1 2 36 .  1237 

.  t238. 

1239 

double  t240. 

t241.  t242. 

t243. 

t244.  t2 15.  t246.  t2  *7 

double  t301 . 

t302.  t303. 

t304. 

t305.  t306.  1 307 .  t308 

.  t309; 

double  t310.  t31 1.  t312. 

t3 13 . 

f314.  t315.  (316.  t317 

double  slOO. 

>101.  s!02. 

>103. 

>104.  >105.  M06.  s!07 

.  >108. 
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double  sllO,  sill,  sll2.  sll3,  sll4.  sll5; 

double  s200,  s201,  s202.  s203.  s204,  s205.  s206,  s207,  s208,  s209; 

double  s210,  s211,  s212.  s213,  s214,  s215,  s216,  s217; 

double  s300,  s301,  s302,  s303,  s304,  s305,  s306,  s307,  s308,  s309; 

double  s310,  s311,  s312,  s313,  s314,  s315,  s316,  s317,  s318,  s319; 

double  s320,  s321,  s322,  s323,  s324,  s325,  s326,  s327,  s328,  s329; 

double  s330,  s331,  s332,  s333,  s334,  s335; 

double  ulOO,  ulOl,  ul02,  ul03,  ul04,  ul05,  ul06,  ulQ7,  ul08,  ul09; 

double  ullO,  ulll,  ull2,  ull3,  ull4,  ull5,  ull6,  ull7,  ull8,  ull9; 

double  ul20,  ul21,  ul22,  ul23,  ul24,  ul25,  ul26,  ul27,  ul28,  ul29; 

double  ul30,  ul31,  ul32,  ul33,  ul34,  ul35; 

double  u200,  u201,  u202,  u203,  u204,  u205,  u206,  u207,  u208,  u209; 

double  u210,  u211,  u212,  u213,  u214,  u215,  u216,  u217,  u218,  u219; 

double  u220,  u221,  u222,  u223,  u224,  u225,  u226,  u227,  u228,  u229; 

double  u230,  u231,  u232,  u233,  u234,  u235,  u236,  u237,  u238,  u239; 

double  u240,  u241,  u242,  u243,  u244,  u245,  u246,  u247; 

double  u301.  u302,  u303,  u304,  u305,  u306,  u307,  u308,  u309: 

double  u310,  u311,  u312.  u313,  u314,  u315,  u316,  u317; 

************************************************************************ 

* 

*  r  variables  are  the  real  pre-add  equations. 

* 

************************************************************************, 


rlOO  =  xihfljj  +  x[h(16jj; 
rl08  =  xihllj]  -  Xth[l6j] ; 
rlOl  =  xih[3j]  4-  x[hjl4]J; 


I 

I 


I 
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» 
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r328  =  rl08; 
r329  =  rll2: 
r330  =  rill  t-  rll5; 
r331  =  rill; 
r332  =  rll5; 

r333  =  r322  -  r316  +  rl08  -  r330; 
r334  =  r315  -  r32l  +  rill  +  rll2  -  rll5; 
r335  =  r333  4-  r334; 

**************** •****•*•**•* ******** ************************************ 

* 

*  After  the  pre-adds,  the  sums  are  multiplied  by  the  proper 

*  coefficients. 

* 

******* A***************** ***********************************************/ 

xihlOl]  =  x(h[0)]  4-  r307; 
tlOl  =  r301  *  C1701; 
tl02  =  r302  *  C1702; 
tl03  =  r303  *  C1703; 
tl04  =  r304  *  C1704; 
tl05  =  r305  *  C1705; 

1106  =  r 306  •  Cl  706; 

1 107  =  r307  *  C1707; 
tl08  =  r308  *  C1708; 
tl09  =  r309  *  0709: 
tllO  =  r310  *  Cl 7 10: 
till  =  r3 1 1  *  C1711: 

1 1 12  =  r312  *  0712: 
tl  13  =  r313  *  0713: 
tl  14  =  r314  *  0714. 
tl  15  =  r315  *  0715. 

1 1 1 6  =  r316  *  0716: 
tl  17  =  r317  *  0717: 
tl  18  =  r318  *  0718: 
tl  19  =  r319  *  0719: 
tl20  =  r320  *  0720: 
tl21  =  r321  *  C1721; 

1 1 22  =  r322  *  0  722; 

1 123  =  r323  *  0  723; 

1 1 2  4  =  r32 1  *  C1721: 

1 125  =  r325  *  0  725: 


O 


i 
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tl26  =  r326  *  C1726; 
tl27  -  r327  *  C1727; 
tl28  =  r328  *  C1728; 

1129  =  r329  *  C1729; 
tl30  =  r330  *  C1730; 
tl31  =  r331  *  C1731; 
tl32  =  r332  *  C1732; 
tl33  =  r333  *  C1733; 
tl34  =  r334  *  C1734; 
tl35  =  r335  *  C1735; 
tI07  =  tl07  -  xfhjOjj; 

***4^4^*^*********4^**************************************************** 

t  variables  are  the  real  post-add  equations. 


*********•«*•******•***«•****•***** ************ *************************/ 

t200  =  tl09  -  till; 
t201  =  tllO  -  till, 
t202  =  tl04  -  tl  12; 
t203  =  tl  12  -  tl03; 
t204  =  tl02  -  til 3; 
t205  =  tlOl  -  tl  13; 
t206  =  tl  14  -  tl06; 
t207  =  tl  14  -  tl05; 
t208  =  tl08  -  tl07 ; 
t209  =  tl07  -  tl08: 
t210  =  t200  -  t202: 
t2 1 1  =  t206  -  t208; 

t2 1 2  =  t201  -  t203; 
t213  =  t207  -  t209; 

1214  =  t200  -  t204; 

1 2 1 5  =  t208  -  t,206: 
t216  =  t201  -  t205; 
t217  =  t209  -  t207; 
t302  =  t210  -  t2 1 1 ; 


E 


)  =  t215  -  t214: 
l  =  t217  -  t216; 

)  =  tll5  +  tll7; 

L  =  tl  16  +  tll7; 

!  =  tll8  +  tl20; 

I  =  tll9  -+-  tl20; 
l  =  tl21  +  tl23; 

>  =  tl22  +  tl23; 

»  =  tl24  +  tl26; 

'  =  tl25  +  tl26; 

I  =  tl35  +  tl34; 

I  =  tl27  +  t228; 

I  =  t229  +  tl28; 

=  t220  +  t222; 

:  =  t220  -  t222; 

=  t221  +  t223; 

=  t221  -  t223; 

=  t224  +  t226; 

=  t224  -  t226; 

=  t225  +  t227; 

=  t225  -  t227; 

=  tl33  -  tl34; 

=  C229  +  tl29; 

=  t239  4-  t239; 

=  tl30  -  t241; 

=  t242  -  tl31; 

=  -(t242  +  tl32); 

=  t228  *  t228; 

=  t245  -  t245; 

=  t239  j-  t245: 

=  1.233  -  t237  -  t240: 

=  t232  -  t238  -  t.243: 

=  t231  -  t235  -  t245: 

=  -(t232  *  t238  -  t247); 

=  t231  *  t235  -  t230  *  t239: 
=  t244  -  1246  -  t234  -  t236; 

t237  -  t233  -  t241  -  t245; 
=  t234  -  t236  -  t239; 
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********************************** ************ ************************** 


s  variables  are  the  imaginary  pre-add  equations. 


************************************************************************ , 

slOO  =  y[h[lj]  +  y[h[16]]; 
sl08  =  y[h(lj]  -  y(h[16)]; 
slOl  =  y[h[3|]  +  y(h[14j]; 
sl09  =  y(h(3]]  -  y[h(l4]]; 
sl02  =  y  [h  [9]  ]  +  y  [h[8j] ; 
si  10  =  y(h(9]j  -  y(h(8]]; 
sl03  =  y[h(lOj]  +  y[h[7j]; 
sill  =  y[h[lO]]  -  y[h[7]]; 
sl04  =  y(h[l3j]  +  y[h[4]]; 
sll2  =  y[h[13)]  -  y[h(4]]; 
sl05  =  y[h(5j]  +  y(h[12jj; 
sll3  =  yjh[5]]  -  y[h[l2j]; 
sl06  —  y[h[l5|]  +  y[h[2j]; 
sll4  =  y[h[15]]  -  y[h[2]]; 
sl07  =  y[h[llj]  +  y[h(6]j; 
sll5  =  y[h(ll|]  -  y(h[6jj; 
s200  =  slOO  4-  sl04; 
s201  =  slOl  -r-  sl05; 
s202  =  sl02  -r  sl06; 
s203  =  sl03  4  sl07; 
s204  =  s200  +  s202; 
s205  =  s20l  4  s203; 
s301  =  slOO  -  sl04; 
s302  =  slOl  -  sl05; 
s303  =  sl02  -  sl06; 
s304  =  s!03  -  sl07; 
s305  =  s200  -  s202; 
s306  =  s20l  -  s203; 
s307  =  s204  -  s205; 
s308  =  s204  -  s205. 
s309  =  s302  -  s304; 
s310  =  s301  -  s303; 
s3l  1  =  s310  -  s309; 
s312  =  s303  -  s304; 
s313  =  s301  -  s302 ; 


s314  =  s305  -  s306: 
s210  =  sl08  4-  sllO; 
s211  =  sl09  -r  sill; 
s212  =  sl08  -  si  10; 
s213  =  sll5  -  sll3; 
s214  =  sll2  +  sll4; 
s215  =  si  13  4-  sll5; 
s216  =  sll2  -  si  14; 
s217  =  sl09  -  sill; 
s315  =  s210  4-  s211; 
s316  =  s214  4-  s215; 
s317  =  s315  4-  s316; 
s318  =  s210-s211; 
s319  =  s214  -  s215; 
s320  =  s318  4-  s319; 
s321  =  s212  4-  s213; 
s322  =  s216  4-  s217; 
s323  =  s321  4-  s322; 
s324  =  s212  -  s213; 
s325  =  s216  -  s217; 

3326  ==  s324  -  s325; 
s327  =  sl08  -r  sll2; 
s328  =  sl08: 
s329  =  si  12; 
s330  =  sill  *  sll5; 
s33l  =  sill; 
s332  =  sll5; 

s333  =  s322  -  s316  -  sl08  -  s330; 
s334  =  s315  -  s32l  +  sill  -  sll2  -  sll5; 
s335  =  s333  -  s334; 

■  ••iimoitxmottxiiiitiiotiiiotMxtxDuixxMoo.i 


After  the  pre-adds,  the  sums  are  multiplied  by  the  proper 
coefficients. 


y  h  O  =  y  h  OK  -  s307; 
ulOl  =  s301  *  C1701; 
ul02  =  s302  *  C1702; 
ul03  =  s303  *  Cl 703; 


aw 


ul04  —  s304  *  C1704 
ul05  —  s305  *  C1705 
ul06  =  s306  *  C1706 
ul07  =  s307  *  C1707 
ul08  =  s308  *  C1708 
ul09  =  s309  *  C1709 
ullO  =  s310  *  C1710 
ulll  =  s311  *  C1711 
ull2  =  s312  *  C1712 
ull3  =  s313  *  C1713 
ull4  =  s314  *  C1714 
ull5  =  s315  *  C1715 
ull6  =  s316  *  C1718 
ull7  =  s317  *  C1717 
ull8  =  s318  *  C1718 
ull9  =  s319  *  C1719 
ul20  =  s320  *  C1720 
ul21  =  s321  *  C1721 
ul22  =  s322  *  C1722 
ul23  =  s323  *  C1723 
ul24  ==  s324  *  Cl 724 
ul25  =  s325  *  C1725 
ul26  =  s326  *  C1726 
ul27  =  s327  *  C1727 
ul28  =  s328  *  C1728 
ul29  -  s329  *  C1729 
uI30  =  s330  *  C1730 
ul31  =  s331  *  C1731 
ul32  =  s332  *  C1732 
ul33  =  s333  *  C1733 
ul34  =  s334  *  C1734 
u!35  =  s335  *  Cl 735 


*  u  variables  are  the  imaginary  post-add  equations. 

* 


X 
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B.8  DIFF15.C 


************************************************************************ 

* 

*  Program:  diffl5.c 

* 

*  Author:  Kent  Taylor 

* 

*  Date:  20  November  1985 

* 


Purpose:  To  compare  the  outputs  of  simluation  and  standard 
implementations  of  Wi nograd's  15-point  DFT  algo¬ 
rithm.  The  simulation  routine  uses  integer  arith¬ 
metic,  while  the  standard  uses  double  precision 
real  arithmetic.  The  comparison  is  done  using 
signal-to-noise  ratio  (SNR)  as  the  measure.  The 
SNR  is  computed  by  summing  the  magnitude  of  all 
signal  components  (standard  outputs)  and  dividing 
by  the  sum  of  all  the  noise  components  (standard 
minus  simulation).  The  SNR  is  expressed  in  dB 
(10  log  ratio);  the  magnitude  of  noise  is  also 
stored  for  comparison. 


Inputs:  files  of  random  numbers 


*  Oil! puts:  files  of  magnitude  differences 

*  SNR  in  dB 

* 


#inelude 

'stdio-h" 

^include 

"  math.1T 

If  include 

-’siml5.cr’ 

^include 

"  inult. c~ 

^include 

’  winol5.c” 

#detine 

SIZE  14 

# define 

INMA.SK 

0377777 t  t 

Adeline 

RNDMASK 

040 

Adeline 

PI  3.1 41. >92653589 

79 

double 

loglO  (); 

main  () 

{ 

double  dbx,  dby,  simrsig,  simisig,  snrsr.  snrsi,  tempi,  temp2; 

double  difix,  diffy,  snrx,  snry,  a[SIZE+l],  b[SIZE+lj; 

double  realsig,  imagsig,  realnoise,  imagnoise,  tempx,  tempy; 

long  x[SIZE+l],  y[SIZE+l]; 

int  j,  k,  n,  h[SIZE-t-l],  rxbit,  rybit,  signx,  signy; 

char  *outfna me; 

double  snrxl,  snrx2,  snryl,  snry2; 

FILE  *gp,  *hp,  *fopen  (); 

hp  =  fopen  (”snrl5”,  "w");  j*  open  file  for  output  SNRs  */ 

srand  (1); 

for  (n  =  0;  n  <  =  99;  n++) 

{ 

************ ****************************************************** ****** 

* 

*  Assign  a  unique  filename  to  each  output  file. 

* 

************************************************************************ j 

switch  (n) 

{ 

case  0:  outfname  =  ’’../outputl5/result00_15’>;  break; 
case  1:  outfname  =  ”  ../outputl5/result01_15” ;  break: 
case  2;  outfname  ==  ,’../outputl5/result02_15’’;  break; 
case  3:  outfname  =  '1../outputl5/result03_15'’;  break; 
case  4;  outfname  =  ”...  outputl5/result04_15-’;  break; 
case  5:  outfname  =  ^  .. ,  outputl5  result05_15" :  break: 
case  6:  outfname  =  outputl5/result06_15';  break; 
case  7:  outfname  =  "V.,  outputl5.  result07_15":  break: 
case  S:  outfname  =  output  15.  result08_15";  break: 

case  9:  outfname  =  output  15  result09_15" :  break; 
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case  10:  outfname  =  \./outputl5/resultlO_L5" ;  break: 
case  11:  outfname  =  ’’../outputlS/resultll-lS”;  break; 
case  12:  outfname  —  n  ../outputl5/resultl2 — 15  ,  break, 
case  13:  outfname  =  ’’../outputl5/resultl3_15”;  break; 
case  14:  outfname  =  ’../outputlS/resultH^S”;  break: 
case  15:  outfname  =  ”../outputl5/resultl5_15’’;  break: 
case  16:  outfname  =  ”../outputl5/resultl6_15’’;  break; 
case  17:  outfname  =  ”  ,./outputl5/resultl7_15”  ;  break; 
case  18:  outfname  =  ”  ../outputl5/resultl8_15” :  break; 
case  19:  outfname  =  ”../outputl5/resultl9_15”;  break; 
case  20:  outfname  =  "\./outputl5/result20_15”;  break; 

case  21:  outfname  =  " . . /output  15/ result2 1 _ 15” ;  break: 

case  22:  outfname  =  ”../outputl5/resuIt22_15'’;  break; 
case  23:  outfname  =  ” outputl5/result23_15” ;  break; 
case  24:  outfname  =  ’’../outputl5/result24_15”:  break: 
case  25:  outfname  =  ”..  outputl5/ result25_15" ;  break; 
case  26:  outfname  =  ”../outputl5/result26_I5";  break: 
case  27:  outfname  =  "..  output  15/ result‘27_15  ;  break; 
case  28:  outfname  =  output  15 -  result28_15" ;  break  , 
case  29:  outfname  =  "..  outputlo  result29_15"  break; 


case  30:  outfname  =  ’’../outputl5/result30_15’’;  break; 
case  31:  outfname  =  ”  ../outputl5/result31_15” ;  break; 
case  32:  outfname  =  ”../outputl5/result32_15”;  break; 
case  33:  outfname  =  ”../outputl5/result33_15”;  break; 
case  34:  outfname  =  ”../outputl5/result34_15”;  break; 
case  35:  outfname  =  ”  ../outputl5/result35_15” ;  break; 
case  36:  outfname  =  *  ../outputl5/result36_15B;  break; 
case  37:  outfname  =  ’’../outputl5/result37_15”;  break: 
case  38:  outfname  =  ”../outputl5/result38_15”;  break; 
case  39:  outfname  =  "  ../outputl5/result39_15” ;  break; 
case  40:  outfname  —  ”  ../outputl5/result40_15” ;  break; 
case  41:  outfname  =  ”  ../outputl5/result41_15” ;  break: 
case  42:  outfname  =  ^  ../outputl5/result42_15” ;  break: 
case  43:  outfname  =  ”...  outputl5/ result43_15’’ ;  break; 
case  44:  outfname  =  ” ,'outputl5/result44_15v' ;  break: 
case  45:  outfname  =  outputl5'result45_15-’ :  break; 
case  46:  outfname  =  outputl5<  result46_15” ;  break; 
case  47:  outfname  =--  ”  ...  o'uputl5 -  result47_15”  ;  break: 
case  48:  outfname  =  outputl5  result48_15" ;  break; 


case  49:  outfname  =  "  .  outputlS  result49_L5" ;  break: 


case  50:  outfname  =  ”../outputl5/result50_15”;  break: 
case  51:  outfname  =  ’,../outputl5/result5l_15”;  break; 
case  52:  outfname  =  ”  ../outputl5/result52_15” ;  break; 
case  53:  outfname  =  ”../outputl5/result53_15”;  break; 
case  54:  outfname  =  ”../outputl5/result54_15’’;  break; 
case  55:  outfname  =  "  ../outputl5/result55_15";  break; 
case  56:  outfname  =  ”../outputl5/result56_15”;  break; 
case  57:  outfname  =  ”../outputl5/result57_15”;  break; 
case  58:  outfname  =  ”../outputl5/result58_15”;  break; 
case  59:  outfname  =  ”../outputl5/result59_15”;  break; 
case  60:  outfname  =  ”  ../outputl5/result60_15n ;  break; 
case  61:  outfname  =  ”  ../outputlS/resulteuS” ;  break; 
case  62:  outfname  =  ’  .  outputl5/result62_15'’ :  break; 
case  63:  outfname  =  outputl5  result63_15-’;  break; 

case  64:  outfname  =  "  output  15  resuit64_15”;  break; 

case  65:  outfname  =  outputlS  result65_15";  break; 

case  66:  outfname  =  "  ...  outputl5  result66_15"  .  break; 

case  67:  outfname  —  V  output(5‘  result67_I5".  break: 

case  68  outfname  =  ’  .  output  1 5  result68_15":  break; 

case  69:  outfname  =  "  .  outputlo  result69_15” .  break: 


case  TO:  outfname  =  ”..,outputl5/result70_15”;  break; 
case  71:  outfname  =  ”../outputl5/result71_15”;  break; 
case  72:  outfname  =  ” ../outputl5/result72_15B ;  break; 
case  73:  outfname  =  ”../outputl5/result73_15’’;  break; 
case  74:  outfname  =  "  ../outputl5/result74_15’’ ;  break; 
case  75;  outfname  =  ”..,outputl5/result75_15”;  break; 
case  76:  outfname  =  " ../outputl5/result76_15” ;  break; 
case  77:  outfname  =  ” ../outputl5/resuJt77_15" ;  break; 
case  78:  outfname  =  ”  ../outputl5/result78_15’’ ;  break; 
case  79:  outfname  =  ’’../outputl5/result79_15”;  break: 
case  80:  outfname  =  ”../outputl5/result80JL5”;  break; 
case  81:  outfname  =  ”  ../outputl5/'result81_15” ;  break; 
case  82:  outfname  =  ”  ../outputl5/result82_15’’ ;  break; 
case  83:  outfname  =  ” ...  outputl5/result83_15’’ ;  break; 
case  84:  outfname  =  ^  ..  .  output  15/ result84_15~ :  break; 
case  85:  outfname  =  ’  ...  outputl5'  result85_15” ;  break; 
case  86:  outfname  =  'output  15  result86_15~ ;  break: 
case  87:  outfname  =  ^ ...  output  15  result87_15" :  break: 
case  88:  outfname  —  output!5  result88_I5" ;  break: 
case  89:  outfname  =  ”  ..  output  15  resultS9_15‘’ ;  break; 


case  90:  outfname  =  ” ../outputl5/result90_15’’ ;  break: 
case  91:  outfname  =  ” ../outputl5/result91_l5" ;  break: 
case  92:  outfname  =  ”  ../output  15/result92_1 5”;  break; 
case  93:  outfname  =  n../outputl5/result93_15”;  break; 
case  94:  outfname  =  ”../outputl5/result94_I5”;  break; 
case  95:  outfname  =  ”../outputl5/^esult95_15,’;  break; 
case  96:  outfname  =  ”  ../outputl5/result96_15”;  break; 
case  97:  outfname  =  ” ,./outputl5/result97_15” ;  break; 
case  98:  outfname  =  ’’../outputl5/result98_15’’;  break: 
case  99:  outfname  =  ”  ,./outputl5/result99_15” ;  break; 


}  /*  end  switch  */ 

********** ************************ ****************************  ********** 


Fill  the  input  array  with  random  numbers. 


Scale  the  simulation  input  data  and  initialize  the  index 


array. 


for  (j  =  0;  j  <=  SIZE;  j-> — ) 

{ 

xi’j]  =  x:jj  <  <  2;  '*  zero  fill  and  sign  extend  *  ' 

yij)  =  yiii  <  <  2; 

} 

siml5  (x,  y,  h); 
stdl5  (a,  b,  h); 

,****** ************************ ****************************************** 
* 

*  Take  the  23  most  significant  bits  of  the  simulation  result. 

♦ 

*********************4^*************************************************/ 

for  (j  =  0;  j  <  =  SIZE;  j-t--r) 

{ 

signx  =  I; 
signy  =  I; 
if  (x[j]  <  0) 

{ 

xiil  =  -(xiil); 

signx  =  -1; 

} 

if  (yij]  <  o) 

{ 

ytil  =  -(y(j]); 

signy  =  -1; 

} 

rxbit  =  xijj  &  RNDMASK; 
rybit  =  yfjj  &  RNDMASK; 

xijj  =  xijj  >  >  6; 

y :Ji  =  yij]  >  >  6; 

if  (rxbit  !=  0)  xijj  —  —  ; 

if  (rybit  !=  0)  y 1  j’; - : 

x  j  =  x  j ,  *  signx; 
v  j;  =  yijj  *  signy; 

i 

I 


Compute  the  differences  between  the  standard  and  simluation 
results  (noise  components;  realnoise  and  imagnoise)  Compute 
the  SNR  for  real  and  imaginap.'  results  by  dividing  the  -uni  of 


UO 


r  I 


*  the  standard  results  (signal  components;  realsig  and  imagsig) 

*  by  the  sum  of  the  noise  components.  Send  the  real  and  imagi- 

*  nary  SNRs  to  a  file  containing  SNRs  for  all  inputs;  send  the 

*  differences  to  a  file  for  storage. 

* 

*************************************************»*********************•/ 
gp  =  fopen  (outfname,  ”w”); 
realsig  =  0; 
imagsig  =  0; 
realnoise  =  0; 
imagnoise  =  0; 
simrsig  =  simisig  =  0.0; 
for  (j  =  0;  j  <=  SIZE;  j++) 

{ 

tempx  =  x[j j  <  <  1; 

tempy  =  y[j]  <  <  I; 

simrsig  =  simrsig  4-  (tempx  *  tempx); 

simisig  =  simisig  +  (tempy  *  tempy); 

tempi  =  a[j]  /  8.0;  /*  scale  standard  outputs  down  */ 

temp2  =  b[j]  /  8.0;  /*  by  a  factor  of  8  to  account  */ 

/*  for  the  difference  in  input  */ 

/*  scaling  (2),  output  rounding  */ 

/*  (6),  and  multiplying  (1)  of  */ 

/*  simulation;  6-1-2  =  3;  2**3  =  8  * 1 

diffx  =  tempi  -  x(jj; 
diffy  =  temp2  -  y[j]; 

realsig  =  realsig  +  4.0  *  (tempi  *  tempi); 
imagsig  =  imagsig  —  4.0  *  (temp2  *  temp2); 
realnoise  =  realnoise  •+■  (diffx  *  diffx); 
imagnoise  =  imagnoise  —  (diffy  *  diffv); 
dbx  =  138.0; 
dby  =  138.0: 

if  ((diffx  !=  0.0)  (aijj  !=  0.0)) 
dbx  =  10.0  *  (loglO  (tempi  *  tempi)  -  loglO  (diffx  *  diffx)): 
if  ((diffy  !=  0.0)  && Jbijj  !=  0.0)) 
dby  =  10.0  *  (loglO  (temp2  *  temp2)  -  loglO  (diffy  *  diffy)); 
fprintf  (gp^^od  Oo20.10fcrc20.10f0,  j,  dbx.  dby ). 

} 

priiuf  C  Finished  transferring  output  to  <’os0.  outfname); 
fclose  (gp); 


•  *  **  •  I 
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snrsr  =  snrsi  =  0.0; 
snrsr  =  10.0  *  loglO  (simrsig); 
snrsi  =  10.0  *  loglO  (simisig); 
snrxl  =  10.0  *  loglO  (realsig); 
snrx2  =  10.0  *  loglO  (realnoise); 
snryl  =  10.0  *  loglO  (imagsig); 
snry2  =  10.0  *  loglO  (imagnoise); 
snrx  =  snrxl  -  snrx2; 
snry  =  snryl  -  snry2; 

printf  ("simrsig  =  9520. lOf  simisig  =  9520. lOfO,  snrsr,  snrsi); 
printf  ("realsig  =  9520. lOf  imagsig  =  9520. lOfO,  snrxl,  snryl); 
printf  ("realnoise  =  9520.10f  imagnoise  =  9520. lOfO,  snrx2,  snry2) 
fprintf  (hp,"95d  9520. lOf  9520.10f0,  n,  snrx,  snry); 

/*  end  n  loop  */ 

/*  end  main  */ 


B.7  DIFF18.C 


************************************************************************ 

Program:  diffl6.c 
Author:  Kent  Taylor 
Date:  20  November  1985 

Purpose:  To  compare  the  outputs  of  simluation  and  standard 
implementations  of  Winograd’s  16-point  DFT  algo¬ 
rithm.  The  simulation  routine  uses  integer  arith¬ 
metic,  while  the  standard  uses  double  precision 
real  arithmetic.  The  comparison  is  done  using 
signal-to-noise  ratio  (SNR)  as  the  measure.  The 
SNR  is  computed  by  summing  the  magnitude  of  all 
signal  components  (standard  outputs)  and  dividing 
by  the  sum  of  all  the  noise  components  (standard 
minus  simulation).  The  SNR  is  expressed  in  dB 

*  (10  log  ratio);  the  magnitude  of  noise  is  also 

*  stored  for  comparison. 


Inputs:  files  of  random  numbers 


*  Outputs:  files  of  magnitude  differences 

*  SNR  in  dB 

* 

************************************************************************ 


#include 

”stdio.h” 

#include 

’’math.h* 

^include 

-’siml6.c” 

^include 

’’  mult.c” 

^include 

',winol6.c'’ 

#define 

SIZE  15 

#define 

MASK 

077777777 

#define 

INMASK 

037777777 

^define 

NUMBER 

8388608 

#detine 

OUTMASK 

U.J  /  i  i  i  !  7  7  UOU 

fltdefine 

RNDMASK 

0400 

."■'v*  71 


• .  'v.  r  v  1  t 


long  mult  (); 

double  loglO  (); 

main  () 

{ 

double  simrsig,  simisig,  snrsr,  snrsi,  tempi,  temp2,  tempx,  tempy; 

double  dbx,  dby,  snrxl,  snrx2,  snryl,  snry2; 

double  diffx,  diffy,  snrx,  snry,  a[SIZE+l],  b[SIZE+l]; 

double  realsig,  imagsig,  realnoise,  imagnoise; 

long  x[SIZE+l],  y [SIZE-Hl] ; 

int  j,  k,  n,  nnn,  m,  hjSIZE+lJ,  rxbit,  rybit,  signx,  signy; 
char  *outfname; 

FILE  *gp,  *hp,  *fopen  (); 

hp  =  fopen  (”snrl6”,  ”w”);  /*  open  file  for  output  SNRs  */ 

for  (n  =  0;  n  <=  99;  n++) 

{ 

/  ************************************************************************ 
* 

*  Assign  a  unique  filename  to  each  output  file. 

* 

************************************************************************/ 

switch  (n) 

{ 

case  0:  outfname  =  ”../outputl6/result00_16n;  break; 
case  1:  outfname  =  ‘,../outputl6/result01_16”;  break; 
case  2:  outfname  =  '\./'outputl6/result02_16’’;  break; 
case  3:  outfname  =  ’\./outputl6/resuit03_I6”;  break; 
case  4:  outfname  =  ■,../outputl6/result04_16T’;  break: 
case  5;  outfname  =  ”../outputl6/result05_16”;  break: 
case  6:  outfname  =  ”  ../outputl6/result06_16” ;  break: 
case  7:  outfname  =  '’..,outputl6/result07_16”;  break; 
case  8:  outfname  =  outputl6/result08_16” ;  break: 
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case  9:  outfname  =  ’’../outputl6/result09_16’’;  break; 
case  10:  outfname  =  ’’../outputl6/resultl0_16”;  break; 
case  11:  outfname  =  B../outputl6/resultll_l6B;  break; 
case  12:  outfname  ==  B../outputl6/resultl2JL6B;  break; 
case  13:  outfname  =  B../outputl8/resultl3_16B;  break; 
case  14:  outfname  =  B../outputl6/resultl4JL6B;  break; 
case  15:  outfname  =  ”../outputl6/resultl5_16”;  break; 
case  16:  outfname  =  ”  ../outputl6/resultl6_16” ;  break; 
case  17:  outfname  =  B../outputl6/resultl7_16B;  break; 
case  18:  outfname  =  /output l6/resuitl8_l 6”;  break; 

case  19:  outfname  =  ”  ../outputl6/resultl9_16”;  break; 
case  20:  outfname  =  B../outputl6/result20_16B;  break; 
case  21:  outfname  =  ”  ../outputl6/result21_16” ;  break; 
case  22:  outfname  =  B../outputl6/result22_16”;  break; 
case  23:  outfname  =  ”  ../outputl6/result23_l6” ;  break: 
case  24:  outfname  =  B../outputl6/result24_16”;  break; 
case  25:  outfname  =  ’’ ../outputl6/result25_16'’ :  break; 
case  26:  outfname  =  ”../outputl6/result26_l6";  break; 
case  27:  outfname  —  ’’../outputl6/'result27_I6'’;  break; 
case  28:  outfname  —  outputl6/resuit28_16’’:  break: 


■  y  TV"""  •  -  .  • ~ 


case  29:  outfname  ==  ’’../outputl6/result29_16”;  break; 
case  30:  outfname  =  ”  ../outputl6/result30_16’’ ;  break; 
case  31:  outfname  =  ’’../outputl6/result31_16";  break; 
case  32:  outfname  =  ”../outputl6/result32_16”;  break; 
case  33:  outfname  =  ”../outputl6/result33_16”;  break; 
case  34:  outfname  =  ”../outputl6/result34_16’’;  break; 
case  35:  outfname  =  ” ../outputl6/result35_16” ;  break; 
case  36:  outfname  =  ”../outputl6/result36_16n;  break; 
case  37:  outfname  =  ”../outputl6/result37_16”;  break; 
case  38:  outfname  =  ”  ../outputl6/result38_16” ;  break; 
case  39:  outfname  =  ”  ../outputl6/resuit39_16”;  break; 
case  40:  outfname  =  ’’../outputl6/result40_16”;  break; 
case  41:  outfname  =  ”../outputl6/result41_16”;  break: 
case  42:  outfname  =  ” ../outputl6/result42_16” ;  break; 
case  43:  outfname  =  '\./outputl6/result43_16” ;  break; 
case  44:  outfname  =  outputi6/result44_16” ;  break: 
case  45:  outfname  =  "  output  16 /  result45_16” ;  break: 
case  46:  outfname  =  output  16 /result46_16”  ;  break; 
case  47:  outfname  =  ’’ ...  outputl6/resuit47_16” ;  break: 
case  48:  outfname  =  ',../outputl6/resuIt48_16";  break: 
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case  49:  outfname  =  ”  ../outputl6/result49_J.6” ;  break; 
case  50:  outfname  =  ”../outputl6/resu]t50_16”;  break; 

case  51;  outfname  =  ’’../outputl6/result51_16”;  break; 

case  52;  outfname  =  ”../outputl6/result52_16”;  break; 

case  53:  outfname  =  ”../outputl6/result53_16”;  break; 

case  54:  outfname  =  ”../outputl6/result54_16”;  break; 

case  55:  outfname  =  "../output  16/result55_L6”;  break; 

case  56:  outfname  =  ”../outputl6/result56_16”;  break; 

case  57:  outfname  =  ”../outputl6/result57_16”;  break; 

case  58:  outfname  =  ”../outputl6/result58_16”;  break; 

case  59:  outfname  =  ”  ../outputl6/result59_16" ;  break; 

case  60:  outfname  =  ”  ../outputl6/result60_16” ;  break; 

case  61:  outfname  =  ”../outputl6/resuIt61_16" ;  break; 

case  62:  outfname  =  ”../outputl6/resuIt62_16”;  break; 

case  63:  outfname  =  ”..,outputl6/result63_16” ;  break; 

case  64:  outfname  =  " ../outputl6/result64_16” ;  break: 

case  65:  outfname  =  '\.,outputl6/result65_16”;  ^r**ak; 

case  66:  outfname  =  " ../outputl6/result66_16’’ ;  break: 

case  67:  outfname  =  outputl6  - resuk67_16” ;  break: 

case  68:  outfname  =  "...outputlb  result68_16";  break: 


case  69:  outfname  =  output  16  result.69_16" :  break: 


case  70:  outfname  =  ’’../outputl6/result70_L6”;  break; 
case  71:  outfname  =  ”../outputl6/result71_16”;  break; 
case  72:  outfname  =  ’’../outputl6/result72_16’’;  break; 
case  73:  outfname  =  ”  ../outputl6/result73_16” ;  break; 
case  74:  outfname  =  ”  ../output  16/'result74_l  6" ;  break; 
case  75:  outfname  =  ’\./outputl6/result75_16”;  break; 
case  76:  outfname  =  ”../outputl6/result76_16";  break; 
case  77:  outfname  =  B../outputl6/result77_16”;  break; 
case  78:  outfname  =  ”../outputl6/result78_16”;  break; 
case  79:  outfname  =  ”.. /output  16/result79_16” ;  break; 
case  80:  outfname  —  ”  ../outputl6/result80_16” ;  break; 
case  81:  outfname  =  ”  ../outputl6/result81_16” ;  break; 
case  82:  outfname  =  ”  ../outputl6/result82_16” ;  break; 
case  83:  outfname  =  ” ,./outputl6/result83_16” ;  break; 
case  84:  outfname  =  *  ../outputl6/resuit84_16” ;  break, 
case  85:  outfname  =  ^ ../outputl6/result85_16” :  break: 
case  86:  outfname  =  ’’../outputl6/'result86_16'’;  break; 
case  87:  outfname  =  outputl6- result87_16"  ;  break; 
case  88:  outfname  =  ’ ..,  outputl6/ result88_16"  ;  break; 
case  89:  outfname  —  ’..  outputl6;result89_16".  break: 


case  90:  outfname  =  ”  ,./outputl6/result90_16” ;  break; 
case  91:  outfname  ==  ”../outputl6/result91_16”;  break; 
case  92:  outfname  =  n../outputl6/result92_16”;  break; 
case  93:  outfname  =  ”  ,./outputl6/result93_16” ;  break; 
case  94:  outfname  =  ”../outputl6/result94_16”;  break; 
case  95:  outfname  =  ”../outputl6/result95_16”;  break; 
case  96:  outfname  =  ”../outputl6/result96_J6’’;  break; 
case  97:  outfname  =  ”../outputl6/result97_16’’;  break; 
case  98:  outfname  =  ”../outputl6/result98_16” ;  break; 
case  99:  outfname  ==  ’’../outputl6/result99_16”;  break; 

}  /*  end  switch  */ 

*************** ********************************** ************** ********* 

* 

*  Fill  the  input  array  with  random  numbers. 

* 

*********************** A************************************************/ 

for  (j  =  0;  j  <  =  SIZE:  j^-— ) 

{ 

xijj  =  (rand  ()  &  INMASK)  -  4194304; 
yjjj  =  (rand  ()  &  INMASK)  -  4194304: 
aijj  =  xijj; 
b  jj  =  yrjj; 

} 

********************** ******************** **********************  ******** 

* 

*  Scale  the  simulation  input  data  and  ini';alize  the  index 

*  array. 


realsig  =  imagsig  =  0.0; 
for  (j  =  0;  j  <=  SIZE;  j++) 

{ 

xfj]  =  x[j]  <  <  5;  /*  zero  fill  and  sign  extend  */ 

y[j]  =  y(j]  <  <  5; 

h[j]  =  j;  /*  initialize  index  array  */ 

} 

siml6  (x,  y,  h); 
stdl6  (a,  b,  h); 

********* ******************************* ******** ************************ 

* 

*  Take  the  23  most  significant  bits  of  the  simulation  result. 

* 

************************************************************************ / 

for  (j  =  0;  j  <=  SIZE;  j++) 

{ 

signx  =  I; 
signy  =  I; 
if  (xfj]  <  0) 

{ 

x(j]  =  -(xfj]); 

signx  —  -1; 

} 

*f  (y(j|  <  o) 

{ 

yi'jj  =  -(y(jl); 

signy  =  -I; 

} 

rxbit  =  xfj]  &  RNDMASK; 

rybit  =  y[j]  &  RNDMASK; 
xfj]  =  xfj]  >  >  9; 

yjji  =  yjj]  >  >  9: 

if  (rxbit  !=  0)  xfj]  — 
if  (rybit  !=  0)  yfj]  +  +  ; 
xfj]  =  xfj]  *  signx; 
yi’ii  =  y[j]  *  Signy; 

} 

************************************************************************ 


Compute  the  differences  between  the  standard  and  simluation 


*  results  (noise  components:  realnoise  and  imagnoise).  Compute 

*  the  SNR  for  real  and  imaginary  results  by  dividing  the  sum  of 

*  the  standard  results  (signal  components:  realsig  and  imagsig) 

*  by  the  sum  of  the  noise  components.  Send  the  real  and  imagi- 

*  nary  SNRs  to  a  file  containing  SNRs  for  all  inputs;  send  the 

*  differences  to  a  file  for  storage. 

* 

********************************************************** ************** j 

gp  =  fopen  (outfname,  *w"); 

realsig  =  0; 

imagsig  =  0; 

realnoise  =  0; 

imagnoise  =  0; 

simrsig  =  simisig  =  0; 

for  (j  —  0;  j  <=  SIZE;  j++) 


tempx  =  x[j); 
tempy  =  y[j]; 

simrsig  =  simrsig  4-  (tempx  *  tempx); 
simisig  =  simisig  +  (tempy  *  tempy); 

tempi  =  a[j]  /  8.0;  /*  scale  standard  outputs  down  by  */ 

temp2  —  b[j(  /  8.0;  /*  a  factor  of  8  to  account  for  */ 

/*  input  scaling  (5),  output  */ 

/*  rounding  (9),  and  multiplying  */ 
/*  (1)  for  simulation;  9-5-1  =  3:  * 

/*  2**3  =  8 

diffx  =  tempi  -  xjjj; 

diffy  =  temp2  -  y  ( j ]  ; 

realsig  =  realsig  —  (tempi  *  tempi); 

imagsig  =  imagsig  (temp2  *  temp2); 

realnoise  =  realnoise  (diffx  *  diffx); 

imagnoise  =  imagnoise  +•  (diffy  *  diffy); 

dbx  =  138.0; 

dby  =  138.0; 

if  (diffx  !=  0.0) 

dbx  =  -10.0  *  loglO  ((diffx  *  diffx)  /  (aijj  *  aijj)): 
if  (diffy  !=  0.0) 

dby  =  -10. 0  *  loglO  ((diffv  *  diffy)  (b'jj  *  hi  j ; )); 
fprintf  (gp.’^d  ^20.10^20.1010.  j,  dbx,  dby); 
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printf  ("  Finished  transferring  output  to  %s0,  outfname); 
fciose  (gp); 

snrsr  =  10.0  *  loglO  (simrsig); 
snrsi  —  10.0  *  loglO  (simisig); 
snrxl  =  10.0  *  loglO  (realsig); 
snryl  =  10.0  *  loglO  (imagsig); 
snrx2  =  10.0  *  loglO  (realnoise); 
snry2  =  10.0  *  loglO  (imagnoise); 
snrx  =  snrxl  -  snrx2; 
snry  =  snryl  -  snry2; 

printf  ("simrsig  =  %20.10f  simisig  =  %20.10f0,  snrsr,  snrsi); 
pnntf  ("realsig  =  %20.10f  imagsig  =  %20.l0f0,  snrxl,  snryl); 
printf  ("realnoise  =  %20.10f  imagnoise  =  %20.10f0,  snrx2,  snry2) 
fprintf  (hp,”%d  %20.10f%20.10f0,  n,  snrx,  snry); 

/*  end  n  loop  */ 

/*  end  main  */ 


i 


fS  V, 


main  () 

{ 

double  dbx,  dby,  simrsig,  simisig,  snrsr,  snrsi.  tempi,  temp2; 

double  diSx,  diffy,  snrx,  snry,  a[SIZE+l],  b(SIZE+l]; 

double  realsig,  imagsig,  realnoise,  imagnoise,  tempx,  tempy; 

long  x[SIZE+lj,  y(SIZE+l]; 

int  j,  k,  n,  h[SIZE+l],  rxbit,  rybit,  signx,  signy; 

char  ‘outfname; 

double  snrxl,  snrx2,  snryl,  snry2; 

FILE  *gp,  *hp,  *fopen  (); 

hp  =  fopen  (”snrl7”,  ”w”);  /*  open  file  for  output  SNRs  */ 

srand  (1); 

for  (n  =  0;  n  <  =  99;  n++) 

{ 

************************************************************************ 

* 

*  Assign  a  unique  filename  to  each  output  file. 

* 

************************************************************************ / 

switch  (n) 

{ 

case  0:  outfname  —  ”../outputl7/result00_17”;  break; 


* 


: 


case  1:  outfname  =  ”  ../outputl7/result01_l7” ;  break; 
case  2:  outfname  =  '\./outputl7/result02_I7’’;  break; 
case  3:  outfname  =  v outputl7/result03_17” :  break; 
case  4:  outfname  =  output^/resultOlJ?" :  break; 
case  5:  outfname  =  outputl7  result05_17" ;  break; 

case  6:  outfname  =  output!7,  result06_17'’ ;  break; 


r,  - 


case  7:  outfname  =  outputl7  result07_17’’ .  break: 
case  8:  outfname  =  ’  .  outputl7  result()8_l7" :  break; 
case  9:  outfname  =  outputl7  result()9_17" :  break; 


r.- 

ic-:- 

t 

r  . 

k 
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case  10:  outfname  =  ”../outputl7/resultl0_17  ;  break; 
case  11:  outfname  =  ”../outputl7/resultll_17”;  break; 
case  12:  outfname  =  ’’../outputl7/resultl2_17";  break; 
case  13:  outfname  =  B../outputl7/resultl3_17”;  break; 
case  14:  outfname  =  ”../outputl7/resultl4_17n;  break; 
case  15:  outfname  =  ”../outputl7/resultl5_17”;  break; 
case  16:  outfname  =  ’’../outputl7/resultl6_17”;  break; 
case  17:  outfname  =  ’’../outputl7/resultl7_17”;  break; 
case  18:  outfname  =  ”  ../outputl7/resultl8_17” ;  break; 
case  19:  outfname  =  ”../outputl7/resultl9_17”;  break; 
case  20:  outfname  =  ”../outputl7/result20_17”;  break; 
case  21:  outfname  =  ”  ../outputl7/result21_17” ;  break: 
case  22:  outfname  =  ”  ../outputl7/result22_17” ;  break; 
case  23:  outfname  =  ”../outputl7/result23_17”;  break: 
case  24:  outfname  =  ” ../outputl7/result24_17” ;  break: 
case  25:  outfname  =  ”’../outputl7/result25_l7":  break; 
case  26:  outfname  =  ” ../outputl7/result26_17”;  break; 
case  27:  outfname  =  outputl7  result27_17" ;  break: 
case  28:  outfname  =  ”...  output!7  'result28_l7” ;  break: 


case  29:  outfname  =  ” .. ,  outputl7  result29_17”:  break; 


w.  M. ,  i 
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case  30:  outfname  =  7\./outputl7/result30_17’’;  break; 
case  31:  outfname  =  /output  17/result31_l7”;  break; 
case  32:  outfname  =  ”../outputl7/result32_17”;  break; 
case  33:  outfname  —  ”../outputl7/result33_17’’;  break; 
case  34:  outfname  =  ”../outputl7/result34_17”;  break; 
case  35:  outfname  =  ” ../output  17/result35_17” ;  break; 
case  36:  outfname  =  n../outputl7/result36_17’’;  break; 
case  37:  outfname  =  ”../outputl7/result37_17”;  break; 
case  38:  outfname  =  ”../outputl7/result38_17’’;  break; 
case  39:  outfname  =  ”../outputl7/result39_17”;  break; 
case  40:  outfname  =  ”../outputl7/resuIt40_17”;  break; 
case  41:  outfname  =  ”../outputl7/result41_17”;  break; 
case  42:  outfname  =  ”../outputl7/result42_17”;  break; 
case  43:  outfname  =  ”../outputl7/result43_17”;  break: 
case  44:  outfname  =  ”../'outputl7/result44_17”;  break; 
case  45:  outfname  =  ”../outputl7  result45_17”;  break: 
case  46:  outfname  =  n../outputl7/resuit46_17”;  break; 
case  47:  outfname  =  ^  ../outputl7 ,  result47_17”;  break: 
case  48:  outfname  =  ”.. /output  17. Tesult48_l7”;  break: 
case  49:  outfname  =  outputl"  result49_17” ;  break: 
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case  50:  outfname  =  ”../outputl7/result50_17”;  break; 
case  51:  outfname  =  ”../outputl7/result51_17”;  break; 
case  52:  outfname  =  ”../outputl7/result52_i7”;  break; 
case  53:  outfname  =  ”../outputl7/result53_l7”;  break; 
case  54:  outfname  =  ”../outputl7/result54_J7”;  break; 
case  55:  outfname  =  ”../outputl7/result55_l7”;  break; 
case  56:  outfname  =  ”../outputl7/result56_J.7”;  break; 
case  57:  outfname  =  ”../outputl7/result57_l7”;  break; 
case  58:  outfname  =  ”../outputl7/result58_l7”;  break; 
case  59:  outfname  =  ”../outputl7/result59_l7”;  break; 
case  60:  outfname  =  ”../outputl7/result60_l7”;  break: 
case  61:  outfname  =  ”../outputl7/resuIt6I_L7”;  break: 
case  62:  outfname  =  ’\./outputl7/result62_l7’’;  break; 
case  63:  outfname  =  ”  ../outputl7/result63_l7” ;  break; 
case  64:  outfname  =  ’’../outputl7/result64_l7”;  break; 
case  65:  outfname  =  ” ../outputl7/resu!t65_17” ;  break: 
case  66:  outfname  =  ouCputl7/result66_l7"  :  break: 
case  67:  outfname  =  ’ ../outputl7/result67_17” ;  break; 
case  68:  outfname  =  ’’../output  17,  result68_l7” ;  break: 
case  69:  ovitfname  =  ' ,  output!7 'result69_l7” ;  break: 


case  70:  outfname  =  ”../outputl7/result70_17’’;  break; 
case  71:  outfname  =  ”../outputl7/result71_I7”;  break; 
case  72:  outfname  =  ”../outputl7/result72_17”;  break; 
case  73:  outfname  =  "../outputl7/result73_17”;  break; 
case  74:  outfname  =  ”../outputl7/result74_I7’’;  break; 
case  75:  outfname  =  ”../outputl7/result75_17”;  break; 
case  76:  outfname  =  ”../outputl7/result76_17”;  break; 
case  77:  outfname  =  ”../outputl7/result77_17”;  break; 
case  78:  outfname  =  ”../outputl7/result78_17”;  break; 
case  79:  outfname  ==  ”../outputl7/result79_I7”;  break; 
case  80:  outfname  =  ”../outputl7/result80_17”;  break; 
case  81:  outfname  —  ”../outputl7/result81_17";  break; 
case  82:  outfname  =  ”../outputl7/result82_17”;  break; 
case  83:  outfname  =  ”  ../outputl7/result83_17” ;  break: 
case  84:  outfname  =  ”  ../outputl7/resuIt84_17” ;  break; 
case  85:  outfname  =  ” ../outputl7/resuIt85_17” ;  break; 
case  86:  outfname  =  ”../outputl7/result86_17’\  break: 
case  87:  outfname  —  ” ../outputl7/resuit87_17’’ ;  break; 
case  88:  outfname  =  ” ,./outputl7  result88_17" ;  break; 
case  89:  outfname  =  ”  output!7  result89_17” ;  break; 
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case  90:  outfname  =  ”  ../outputl7 /result90_L7” ;  break; 


case  91;  outfname  =  ”  ../outputl7/result91_17” ;  break; 
case  92;  outfname  =  ”../outputl7/result92_17”;  break; 
case  93;  outfname  =  ”../outputl7/result93_17”;  break; 
case  94:  outfname  =  ”../outputl7/result94_17";  break; 
case  95:  outfname  =  ”../outputl7/result95_17B;  break; 
case  96:  outfname  =  B../outputl7/result96_17B;  break; 
case  97:  outfname  —  B../outputl7/result97_17”;  break; 
case  98:  outfname  =  B../outputl7/result98_17B;  break; 


case  99:  outfname  —  B../'outputl7/result99_17B;  break; 

}  /*  end  switch  */ 

,/****  ******************************  ******  ******  ************************** 
* 

*  Fill  the  input  array  with  random  numbers. 

* 

************************************************************************  - 

for  (j  =  0;  j  <=  SIZE;  j++) 

{ 

xjjj  =  (rand  ()  &  INMASK)  -  4194304; 
y(j]  =  (rand  ()  &  INMASK)  -  4194304; 

a(jj  =  xjj); 

b(j]  =  yjii; 

> 

************************************************************************ 

* 

*  Scale  the  simulation  input  data  and  initialize  the  index 

*  array. 

* 

************************************************************************ 


for  (j  =  0:  j  =  SIZE;  j-t-  +  ) 


xjj]  —  x(jj  <  <  3;  /*  zero  fill  and  sign  extend  */ 

yjj]  =  y[j]  <  <  3; 

h(j]  =  j;  /*  initialize  index  array  */ 

} 

siml7  (x,  y,  h); 
stdl7  (a,  b,  h); 

/*************************** ********************************************* 
* 

*  Take  the  23  most  significant  bits  of  the  simulation  result. 

* 

************************************************************************ j 

for  (j  =  0;  j  <=  SIZE;  j++) 

{ 

signx  ==  I; 
signy  =  I; 
if  (x[j)  <  0) 

{ 

x(ii  =  -Mil); 

signx  =  -1; 

} 

(y(j]  <  o) 

{ 

yiii  =  -Mil); 

signy  =  -1; 

} 

rxbit  =  xi  j  j  &  RNDMASK; 
rybit  =  yijj  &  RNDMASK; 

xi  jj  =  xi  j]  >  >  8; 
y  j]  =  yij’i  >  >  8; 

if  (rxbit  !—  0)  xij  j  - — ; 
if  (rvbit  !=  0)  yijj  — — : 
x  jj  =  xi  jj  *  signx; 
yijj  -  yjji  *  signy; 

} 

*************************************** ********************************* 


Compute  the  differences  between  the  standard  and  simluation 
results  (noise  components;  realnoise  and  imagnoise).  Compute 
the  SNR  for  real  and  imaginary  results  by  dividing  the  sum  of 
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the  standard  results  (signal  components;  realsig  and  imagsig) 
by  the  sum  of  the  noise  components.  Send  the  real  and  imagi¬ 
nary  SNRs  to  a  file  containing  SNRs  for  all  inputs;  send  the 
differences  to  a  file  for  storage. 


************************************************************************ 


gp  =  fopen  (outfname,  ’w’); 

realsig  =  0; 

imagsig  =  0; 

reainoise  =  0; 

imagnoise  =  0; 


simrsig  =  simisig  =  0.0; 
for  (j  =  0;  j  <  =  SIZE;  j++) 

{ 

tempx  =  x[j]  <  <  2; 

tempy  =  y(j]  <  <  2; 

simrsig  =  simrsig  +  (tempx  *  tempx); 

simisig  ==  simisig  +  (tempy  *  tempy); 

tempi  =  afjj  /  16.0;  /*  scale  standard  outputs  down  */ 

temp2  =  b[j]  /  16.0;  /*  by  a  factor  of  16  to  account  */ 

/*  for  the  difference  in  input  */ 

/*  scaling  (3),  output  rounding  */ 

/*  (8),  and  multiplying  (1)  of  */ 

/*  simulation;  8-1-3  =  4;  2**4  =  16  * ' 

diffx  =  tempi  -  x(j]; 
diffy  =  temp2  -  yjjj; 

realsig  =  realsig  -t-  16.0  *  (tempi  *  tempi); 
imagsig  =  imagsig  ■+•  16.0  *  (temp2  *  temp2); 
reainoise  =  reainoise  -f-  (diffx  *  diffx); 
imagnoise  =  imagnoise  ■+•  (diffy  *  diffy); 
dbx  ==  138.0; 
dby  =  138.0; 

if  ((diffx  !=  0  0)  kSt  (aijj  !=  0.0)) 
dbx  =  10.0  *  (loglO  (tempi  *  tempi)  -  loglO  (diffx  *  diffx)); 
if  ((diffy  !=  0.0)  &.&  ( bij]  !=  0.0)) 
dby  =  10.0  *  (loglO  (temp2  *  temp2)  -  loglO  (diffy  *  diffy)); 
fprintf  (gp,’°od  co20.l0fco20.10f0.  j,  dbx.  dby); 


printf  (’’  Finished  transferrine  output  to  ^sO.  outfname); 
fclose  (gp); 
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snrxl  =  snrvl  =  138.0: 

snrx2  =  snry2  =  0.0: 

snrsr  =  snrsi  =  0.0; 

snrsr  =  10.0  *  loglO  (simrsig); 

snrsi  =  10.0  *  loglO  (sjmisig); 

snrxl  =  10.0  *  loglO  (realsig); 

snrx2  =  10.0  *  loglO  (realnoise); 

snryl  =  10.0  *  loglO  (imagsig); 

snry2  =  10.0  *  loglO  (imagnoise); 

snrx  =  snrxl  -  snrx2; 

snry  =  snryl  -  snry2; 

printf  ("simrsig  =  %20.l0f  simisig  =  9520. lOfO,  snrsr,  snrsi); 
printf  ("realsig  =  9520.10f  imagsig  =  9520.  lOfO,  snrxl,  snryl); 
printf  ("realnoise  —  9520.1  Of  imagnoise  =  %20.l0f0,  snrx2,  snry 
fprintf  (hp,”95d  %20.10f  °520.l0f0,  n,  snrx,  snry); 

}  /*  end  n  loop  *  j 

/*  end  main  */ 


B.9  STDDIFF_L6.C 


************************************************************************ 


Program:  stddi£F_I6.c 
Author:  Kent  Taylor 
Date:  31  October  1985 


Purpose:  To  compare  the  outputs  of  direct  DFT  and  standard 
implementation  of  Winograd’s  16-point  DFT  algo¬ 
rithm.  Both  routines  use  double  precision  arith¬ 
metic  The  comparison  is  done  using  signal-to-noise 
ratio  (SNR)  as  the  measure.  The  SNR  is  computed 
by  summing  the  magnitude  of  all  signal  components 
(DFT  outputs)  and  dividing  by  the  sum  of  all  the 
noise  components  (DFT  minus  Winograd).  The  SNR 
is  expressed  in  dB  (10  log  ratio);  the  individual 
SNRs  are  also  stored  for  comparison. 

Inputs:  files  of  random  numbers 

Outputs:  files  of  individual  SNRs 
SNR  in  dB 


ft***********************************************************************'' 


#include 

"stdio.h” 

#include 

"math-h” 

#include 

"winoie.c” 

#define 

SIZE  15 

#define 

INMASK 

#define 

PI  3.141592653589' 

double 

sin  (); 

double 

cos  (); 

double 

hvpot  (); 

double 

main  () 

log  10  (); 

double  x.SIZE  — ii,  vfSIZE— ll.  xl.  xx; 


double  xaiSIZE+lj,  yb[SIZE+lj; 
double  dbx,  dby; 

double  difEx,  diffy,  snrx,  snry,  a[SIZE+l],  b(SIZE-rl); 
double  realsig,  imagsig,  realnoise,  imagnoise; 
double  dbxmin,  dbymin; 
int  j,  k,  n,  h(SIZE+lj; 
char  ‘outfname; 

FILE  *gp,  *hp,  ‘fopen  (); 

hp  =  fopen  (’’dsnrlfi”,  ”w”);  /*  open  file  for  output  SNRs  */ 

srand  (l); 

for  (n  =  0;  n  <=  99;  n++) 

{ 

************************************************************************ 

* 

*  Assign  a  unique  filename  to  each  output  file. 

* 

******************************************* **************** ** ***********/ 

switch  (n) 

{ 

case  0:  outfname  =  ”../doutputI6/resultOO_16”;  break; 
case  1:  outfname  =  ”  ../doutputlfi/resultOl _16” ;  break; 
case  2:  outfname  =  ”  ../doutputl6/resuit02_16” ;  break; 
case  3:  outfname  =  ’ ../doutputI6/result03_16” ;  break; 
case  4:  outfname  =  ”../doutputl6/result04_16”;  break; 
case  5:  outfname  =  ”../doutputI6/result05_I6’’;  break: 
case  6:  outfname  =  doutputl6/result06_16”:  break: 
case  outfname  =  doutput!6<  result07_16”;  break; 
case  8:  outfname  =  doutputl6resuit08_16":  break; 
case  9:  outfname  =  doutputl6/result09_16”;  break: 
case  10:  outfname  =  doutputl6  resultlO_l(5’’:  break: 


case  11:  outfname  =  ”../doutputl6/resultll_16”;  break; 


case  12:  outfname 

case  13:  outfname 

case  14:  outfname 

case  15:  outfname 

case  16:  outfname 

case  17:  outfname 

case  18:  outfname 

case  19:  outfname  = 

case  20:  outfname  = 

case  21:  outfname  = 

case  22:  outfname  = 

case  23:  outfname 

case  24:  outfname  = 

case  25:  outfname  = 

case  26:  outfname 

case  27 :  outfname  = 

case  28:  outfname 

case  29:  outfname  = 


case  30:  outfname  = 


’’../doutputl6/resultl2_16”;  break; 
”../doutputl6/resultl3_16”;  break; 
”  ../doutputl6/resultl4_16” ;  break; 

”  ../doutputl6/resultl5_16” .  break; 

”  ../doutputl6/resuItl6_16” ;  break; 
*  ../doutputl6/ resultl7_16” ;  break; 

”  ../<k>utputl6/resultl8_16” ;  break; 
”  ../<k>utputl6/resultl9_16”;  break: 

” . . / doutputl6/ result20_16” ;  break; 
”  ,./doutputl6/result21_16” ;  break; 
”  ../doutputl6/result22_16” ;  break: 
,’../'doutputl6/result23_16’’;  break; 
”../doutputl6/result24_16” ;  break; 
”../doutputl6/result25_16” ;  break; 
’ ...  doutputl6/  resuit26_I6” ;  break: 
’’ ../doutputl6/result27_16” ;  break: 
^ doutputl6/  resuit28_16" :  break: 
doutputl6  -  result29_I6” ;  break; 
doutputl6- result30_16" .  break: 


case  31:  outfname  =  ”../doutputl6/result31_16”;  break; 


case  32:  outfname  =  ”../doutputl6/result32_16’’;  break; 

case  33:  outfname  =  ”../doutputl6/resuIt33_16”;  break; 

case  34:  outfname  =  ”../doutputl6/result34_16’’;  break; 

case  35:  outfname  =  ” ../doutputl6/result35_16” ;  break; 

case  36:  outfname  =  ”  ../doutputl6/result36_16” ;  break; 

case  37:  outfname  =  ”  ../doutputl8/result37_16” ;  break; 

case  38:  outfname  =  ”../doutputl6/result38_16”;  break; 

case  39:  outfname  =  ’\./doutputl6/result39_16”;  break; 

case  40:  outfname  =  ”  ,./doutputl6/result40_16” ;  break; 

case  41:  outfname  =  ”  ../doutputl6/result41_16” ;  break; 

case  42:  outfname  =  ” ../doutputl6/result42_16” ;  break; 

case  43:  outfname  =  "../doutputie/resultlSje”;  break; 

case  44:  outfname  =  ”../doutputI6/result44_16”:  break: 

case  45:  outfname  =  ;l../doutputl6/result45_16’’;  break: 

case  46:  outfname  =  ”  ../doutputl6/result46_16” :  break; 

case  47:  outfname  =  doutputl6/result47_16”:  break; 

case  48:  outfname  =  doutputl6''result48_l6" ;  break. 

case  49:  outfname  =  ■,../doutputl6/result49_16-’;  break: 
case  50:  outfname  =  *\./doutputl6  resuit50_16”;  break: 
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case  51:  outfname  =  ”../doutputl6/result51_16’’;  break; 


case  52:  outfname  =  ”  ../doutputl6/result52_16” ;  break; 
case  53:  outfname  =  ”../doutputl6/result53_16”;  break; 
case  54:  outfname  =  ",./doutputl6/result54_l6”;  break; 
case  55:  outfname  =  *../doutputl6/result55_16”;  break; 
case  56:  outfname  =  *  ../doutputl6/result56_16B;  break; 
case  57:  outfname  =  ”  ../doutputl6/result57_16” ;  break; 
case  58:  outfname  =  *  ../doutputl6/result58_l6”;  break; 
case  59:  outfname  =  ”../doutputl6/result59_16”;  break; 
case  60:  outfname  =  ”  ,./doutputl6/result60_16” ;  break; 
case  61:  outfname  =  ”../doutputl6/result61_16”;  break; 
case  62:  outfname  =  ”  ../doutputl6/result62_16” ;  break; 
case  63:  outfname  =  ”../doutputl6/result63_16”;  break; 
case  64:  outfname  =  ’’../doutputl6/result64_16”;  break: 
case  65:  outfname  =  ”../doutputl6/result65_16’’;  break; 
case  66:  outfname  =  ”  ../doutputl6/resuit66_16” ;  break; 
case  67:  outfname  =  ^  ...  doutputl6<  result67_16”  ;  break; 
case  68:  outfname  =  ” doutputl6.  result68_16" ;  break; 
<-ase  69:  outfname  =  doutputlG  resultG9_16" :  break: 
case  70:  outfname  —  doutputlG.  result70_16" ;  break; 
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case  71:  outfname  =  ”../doutputl6/result71_t6”;  break; 
case  72:  outfname  =  ”../doutputl6/result72_16”;  break; 
case  73:  outfname  ==  ”../doutputl6/result73_16”;  break; 
case  74:  outfname  =  ’’../doutputl6/result74_16”;  break; 
case  75:  outfname  =  ”../doutputl6/result75_16”;  break; 
case  76:  outfname  =  ”../doutputl6/result76_16’’;  break; 
case  77:  outfname  =  ”../doutputl6/result77_16”;  break; 
case  78:  outfname  =  n../doutputl6/resuIt78_16”;  break; 
case  79:  outfname  =  ”../doutputl6/result79J.6’’;  break; 
case  80:  outfname  =  ”  ../doutputl6/result80_16” ;  break; 
case  81:  outfname  =  ”../doutputl6/resuit81_16”;  break; 
case  82:  outfname  =  ”../doutputl6/result82_J6’’;  break; 
case  83:  outfname  =  "  ../doutputl6/result83_J6” :  break: 
case  84:  outfname  =  '\./doutputl6/result84_d6”;  break; 
case  85:  outfname  =  doutputl6  resuitS5_I6" ;  break: 

case  86:  outfname  =  doutputl6  result86_16'’;  break: 

ease  87:  outfname  =  doutputlb  result87_I6" .  break: 

case  88:  outfname  =  " doutputl6/ result88_16”  :  break: 
case  89:  outfname  =  doutputl6/ result89_I6” :  break; 
case  90:  outfname  =  doutput  16  result 90_1(>';  break: 


case  91:  outfname  =  ’’../doutputl6/result91_16B;  break; 
case  92:  outfname  =  *  ,./doutputl6/result92_16” ;  break; 
case  93:  outfname  =  ”../doutputl6/result93_16”;  break; 
case  94:  outfname  =  ’\./doutputl6/resuIt94_l6*;  break; 
case  95:  outfname  =  ’’../doutputl6/result95_16’!;  break; 
case  96:  outfname  =  ’’../doutputl6/result96_16”;  break; 
case  97:  outfname  =  ”../doutputl6/result97_16”;  break; 
case  98:  outfname  =  ”../doutputl6/result98_16”;  break; 
case  99:  outfname  =  ”../doutputl6/resuit99_16”;  break; 

}  /*  end  switch  */ 

************************************************************************ 

Fill  the  input  array  with  random  numbers. 

**************************************************** *******************/ 

for  (j  =  0;  j  <=  SIZE;  j++) 

{ 

xfj j  =  (rand  ()  Sc  INMASK)  -  4194304; 
yij]  =  (rand  ()  &  INMASK)  -  4194304; 

ai'j]  =  x[j); 
biji  =  yij]; 
h[jj  =  j; 

} 

stdlG  (a.  b.  h); 

*  Compute  the  DFT  directly  *  ' 

for  (j  =  0:  j  <  =  SIZE;  j-r— ) 


xx  =  k  *  j; 

xajj]  =  xa[j]  -r  (x[kj  *  cos  (xx  *  xl)); 
ybfjj  =  ybfjj  -  (x(kj  *  sin  (xx  *  xl)); 
yb[j]  =  yb(j]  +  (y [k]  *  cos  (xx  *  xl)); 
xafjj  =  xa[j)  4-  (y[k)  *  sin  (xx  *  xl));  - 

} 

} 

************************************************************************ 

* 

*  Compute  the  differences  between  the  standard  and  direct  DFT 

*  results  (noise  components;  realnoise  and  imagnoise).  Compute 

*  the  SNR  for  real  and  imaginary  results  by  dividing  the  sum  of 

*  the  standard  results  (signal  components;  realsig  and  imagsig) 

*  by  the  sum  of  the  noise  components.  Send  the  real  and  imagi- 

*  nary  SNRs  to  a  file  containing  SNRs  for  all  inputs;  send  the 

*  differences  to  a  file  for  storage. 

* 

******************** **************************************************** j 

gp  =  fopen  (outfname,  ”w”); 

realsig  =  0; 

imagsig  =  0; 

realnoise  =  0; 

imagnoise  =  0; 

for  (j  =  0;  j  <=  SIZE;  j+-f) 

{ 

diffx  =  xa[j]  -  afj); 

diffy  =  y b f j |  -  b(j); 

realsig  =  realsig  (xajj)  *  xaijj); 

imagsig  =  imagsig  +  (ybfjj  *  ybljj); 

realnoise  =  realnoise  +  (diffx  *  diffx); 

imagnoise  =  imagnoise  —  (diffy  *  diffy); 

dbx  =  138.0: 

dbv  =  138.0; 

if  ((diffx  !=  0.0)  &&  (xajj]  !=  0.0)) 
dbx  =  -10. 0  *  loglO  ((diffx  *  diffx)  (xaijj  *  xaijj)); 
if  ((diffy  !=  0.0)  &&  (ybij]  !=  0.0)) 
dbv  =  -10.0  *  loglO  ((diffy  *  diffy)  (yb'jj  *  ybij;)); 
fprintf  (gp.  ,crnd  %20.  t0f^‘J0. lOfO.  j,  dbx.  dbv); 

} 

printf  ("  Finished  transferring  output  to  TsO.  oulfname); 


.  i».". V.  *.  .  * 


_•  _•  --  _  _■ 


-4  • 
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printf  ("realsig  =  %20.l0f  imagsig  =  %20.10f0,  realsig,  imagsig) 
printf  ("realnoise  =  %20.10f”,  realnoise); 
printf  (”  imagnoise  =  %20.10f0,  imagnoise); 
fclose  (gp); 

snrx  =  10.0  *  loglO  (realsig/realnoise); 

snry  =  10.0  *  loglO  (imagsig/imagnoise); 

fprintf  (hp,”%d  %20.10f%20.l0f0,  n,  snrx,  snry); 

/*  end  n  loop  */ 

/*  end  main  */ 


B.10  STDDIFFJ240.C 


************************************************************************ 

* 

*  Program:  stddiff_240.c 

* 

*  Author:  Kent  Taylor 

* 

*  Date:  4  November  1985 

* 


*  Purpose:  To  compare  the  outputs  of  simluation  and  standard 

*  implementations  of  Winograd’s  16-point  DFT  aigo- 

*  rithm.  The  simulation  routine  uses  integer  arith- 

*  metic,  while  the  standard  uses  double  precision 

*  real  arithmetic.  The  comparison  is  done  using 

*  signal-to-noise  ratio  (SNR)  as  the  measure.  The 

*  SNR  is  computed  by  summing  the  magnitude  of  all 

*  signal  components  (standard  outputs)  and  dividing 

*  by  the  sum  of  all  the  noise  components  (standard 

*  minus  simulation).  The  SNR  is  expressed  in  dB 

*  (10  log  ratio);  the  magnitude  of  noise  is  also 

*  stored  for  comparison. 

* 

*  Inputs:  files  of  random  numbers 

* 

*  Outputs:  files  of  magnitude  differences 

*  SNR  in  dB 

* 

ft***********************************************************************/ 


#include 

”stdio.h” 

#include 

"math-h” 

#include 

n  winolS-c71 

#include 

',winoI6.c'’ 

#define 

INMASK 

037777777 

#define 

PI  3.14159265357989 

#define 

SIZE  239 

#define 

LIMIT  20 

#de!ine 

NUMBER 

4194303 

long 

rand  (); 

MUil 


double  loglO  (); 

main  () 

{ 

double  winorsig,  winoisig,  snrsr,  snrsi; 
double  snrxl,  snrx2,  snryl,  snry2; 
double  di£Ex,  diffy,  snrx,  snry,  dbx,  dby; 
double  ajSIZE+l],  al[SIZE+l],  b[SIZE+l],  bl[SIZE+l]; 
double  realsig,  imagsig,  real  noise,  imagnoise; 
double  xl,  yl; 

double  x2[SIZE-Hl),  y2[SIZE+l]; 
double  x[SIZE+l],  y [SIZE-i- 1] ; 
long  tempi,  temp2; 
int  j,  k,  1,  n,  nn,  nl,  n2,  n3,  h[LIMlT]; 
int  ni[4j,  unsc; 
char  ‘outfname; 

FILE  *gp,  *hp,  *fopen  (); 
niiOl  =  15; 
nil  1]  =  16; 
unsc  =  31; 

hp  =  fopen  (”snr240”,  ”w”); 
for  (n  =  0;  n  <  =  99;  n++) 

{ 

************************************************************************ 

* 

*  .\ssign  a  unique  filename  to  each  output  file. 

* 

************************************************************************ 

switch  (n) 

{ 

case  0:  outfname  =  ’  ../output240/result00_240” ;  break: 
case  1:  outfname  =  '\.;output240/result01_240”;  break: 
case  2:  outfname  =  ”../output240/resu!t02_24Cr;  break; 
case  3:  outfname  =  ’’ ../output240/result03_240” :  break; 

-'ase  4:  outfname  =  ’’  ,./output2  I0/result04  JMO"  ;  break; 
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case  5:  outfname  =  ”../output240/result05_240”;  break; 
case  6:  outfname  =  ”../output240/result06_240”;  break; 
case  7:  outfname  =  ”../output240/result07_240”;  break; 
case  8:  outfname  =  ”../output240/result08_240”;  break; 
case  9:  outfname  =  ” . . /output240 / result09_240” ;  break; 
case  10:  outfname  =  ” . . /output240/'resultl0_240” ;  break; 
case  11:  outfname  =  ”../output240/resultll_240”;  break; 
case  12:  outfuame  —  ”../output240/resuItl2_240”;  break; 
case  13:  outfname  =  ”../output240/resultl3_240’’;  break; 
case  14:  outfname  =  ”  ../output240/resultl4_240”;  break; 
case  15:  outfname  =  " . ./out put240/ result  15J240” ;  break; 
case  16:  outfname  =  ”../output240/resultl6_240”;  break; 
case  17:  outfname  =  ”../output240/resultl7_240”;  break; 
case  18:  outfname  =  ”../output240/resuitl8_240”;  break; 
case  19:  outfname  =  ”../output240/resultl9J240”;  break: 
case  20:  outfname  =  ”../output240/result20_240”  break: 
case  21:  outfname  =  / output210/result21  _2 10" :  break: 

case  22:  outfname  =  ”../output240/result22_240”;  break: 
case  23:  outfname  =  oiitput240/result23jM0”:  break: 
case  21:  outfname  =  output2IO  result2l_2I0":  break: 


case  25:  outfname  =  ”../output240/result25_240”:  break; 
case  26:  outfname  =  ” ../output240/result26_240”;  break; 
case  27:  outfname  =  ”../output240/result27_240”;  break; 
case  28:  outfname  =  ”../output240/result28_240”;  break; 
case  29:  outfname  =  ”../output240/result29_240”;  break; 
case  30:  outfname  =  B../output240/result30_240”;  break; 
case  31:  outfname  =  ”../output240/result31_240”;  break; 
case  32:  outfname  =  ”../output240/result32_240”;  break; 
case  33:  outfname  =  ”../output240/result33_240”;  break; 
case  34:  outfname  =  ”../output240/result34_240”;  break; 
case  35:  outfname  —  B../output240/result35J>40”;  break; 
case  36:  outfname  =  ”../output240/result36J’40’’;  break; 
case  37:  outfname  =  ”../output240/resuIt37_240”;  break; 
case  38:  outfname  =  ’\./output240/result38_240”;  break; 
case  39:  outfname  =  ”../output240/result39_240”;  break; 
case  40:  outfname  =  ”../output240/result40j240”;  break; 
case  41:  outfname  =  ”../output240/result41_2l0”:  break: 
case  42:  outfname  =  '’../output240/result42_240”;  break: 
case  43:  outfname  =  '\.,output240/resu]t43_2l0”.  break: 
case  44:  outfname  =  ” .. .  oulput2  JO/resuit.44_2  10" :  break; 
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case  45:  outfname  =  ” . ,/output240/result45_240” ;  break; 

case  46:  outfname  =  ”../output240/result46_240”;  break; 

case  47:  outfname  =  ”../output240/result47_240”;  break; 

case  48:  outfname  —  ’’../output240/result48J240”;  break; 

case  49:  outfname  =  ”../output240/result49_240”;  break; 
case  50:  outfname  =  ’’../output240/result50_240";  break; 

case  51:  outfname  =  ”  ../output240/result51_240’’;  break; 

case  52:  outfname  =  ’\./output240/resuJt52_240”;  break; 

case  53:  outfname  =  ”../output240/result53_240’’;  break; 

case  54:  outfname  =  ”  ,./output240/result54_240”;  break; 

esse  55:  outfname  =  ”../output240/result55_240”;  break; 

case  56:  outfname  =  ”../output240/result56_240”;  break; 

case  57:  outfname  =  ’\./output240/resuit57_240”;  break; 

case  58:  outfname  =  ’’../output24Q/result58_240”;  break; 

case  59:  outfname  =  ’\./output240/resuIt59_240’’;  break; 

case  60:  outfname  =  ”../output240/result60_240”:  break; 

case  61:  outfmme  =  ”../output240/result61_240'’:  break: 

case  62:  outfname  =  ”../output240/result62_2l0’’;  break; 

case  63:  outfname  =  ”../output240/result63_210”;  break: 

case  64:  outfname  =  ”../output240/resull64_2l0’':  break: 

case  65:  outfname  =  output240/result65_210";  break: 
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case  66:  outfname  =  n../output240/result66_240”;  break; 
case  67:  outfname  =  ”../c utput240/result67_240”;  break; 
case  68:  outfname  =  ”../output240/result68_240”;  break; 
case  69:  outfname  =  ”../output240/result69_240”;  break; 
case  70:  outfname  =  n../output240/result70_240”;  break; 
case  71;  outfname  =  B../output240/result71_240”;  break; 
case  72:  outfname  =  ” ../output240/result72_240”;  break; 
case  73:  outfname  =  ”../output240/result73_240”;  break; 
case  74:  outfname  =  "  ../output240/result74_j>40” ;  break; 
case  75:  outfname  =  ”../output240/result75_240”;  break; 
case  76:  outfname  —  ”../output240/result76_240”;  break; 
case  77:  outfname  =  ”../output240/result77_240”;  break; 
case  78:  outfname  =  ”../output240/result78_240”;  break; 
case  79:  outfname  =  ”  ../output240/result79_240”;  break; 
case  80:  outfname  =  ”  ,./output240/result80_240” ;  break: 
case  81:  outfname  =  ”  ../output240/result81_2 10” :  break; 
case  82:  outfname  =  ”../output240/resultS2_240”;  break; 
case  83:  outfname  =  ”..,output240/result83 JMO”;  break: 
case  84:  outfname  —  ”../output240'result84_2l0”;  break; 


case  85:  outfname  =  ”  output240  result85_240”;  break: 


case  86:  outfname  =  ”../output240/result86_240’’;  break; 


case  87:  outfname  =  ”../output240/result87_240’’;  break; 

case  88:  outfname  =  ”../output240/result88_240”;  break; 

case  89:  outfname  =  ”../output240/result89_240”;  break; 

case  90:  outfname  =  ”  ../output240/result90_240”;  break; 

case  91:  outfname  =  ”../output240/result91_240”;  break; 

case  92:  outfname  =  ”../output240/result92_240”;  break; 

case  93:  outfname  =  ” . ./output240 / resuit93_240” ;  break; 

case  94:  outfname  =  ’’../output240/result94_240,>;  break; 

case  95:  outfname  =  *  ../output240/resuit95_240”;  break; 

case  96:  outfname  =  ’’../output240/result96_240”;  break; 

case  97:  outfname  =  ” ../output240/result97_240”;  break; 

case  98:  outfname  =  ”../output240/result98_240”;  break; 

case  99:  outfname  =  ^ ../output240/result99_240”,  break: 

/*  end  switch  */ 

;  0;  j  <  =  SIZE;  j  +  -i-) 

>1  =  (rand  ()  &  INMASK)  -  NUMBER: 

>2  =  (rand  ()  &  INMASK)  -  NUMBER: 

=  tempi; 

=  temp2; 

=  tempi; 

=  temp2: 

■  0;  j  <  =  l;  j-r--*- )  *  pfa  outer  loop 


{ 

nl  =  ni[j] ; 
n2  =  240  /  nl; 

for  (k  =  0;  k  <  =  SIZE;  k  =  k  +  nl)  /*  pfa  inner  loop  */ 

{ 

h[0]  =  k; 
n3  =  k; 

for  (1  =  1;  1  <=  nl-1;  1++) 

{ 

n3  =  n3  +  n2; 
if  (n3  >=  240) 
n3  =  n3  -  240; 
h[l]  =  n3; 

} 

/****** *************4^************* ************************************** 
* 

*  Compute  the  DFT  using  Winograd’s  Small  DFT  algorithm  for 

*  either  the  15-point  or  the  16-point  DFT. 

* 

************************************************************************/ 

/ 

switch  (nl) 

{ 

case  15: 

{ 

stdl5  (a,  b,  h); 
break; 

} 

case  16: 

{ 

stdl6  (a,  b,  h); 
break; 


}  /*  end  pfa  inner  loop  * 

}  *  end  pfa  outer  loop  */ 

*  Unscramble  the  PFA  results  * 
k  =  0; 

for  (j  =  0:  j  <  =  SIZE:  j+-r-) 


blfjj  =  b[kj; 
k  =  k  4-  unsc; 
if  (k  >=  240) 
k  =  k  -  240; 

} 

*  Compute  the  DFT  directly  */ 
for  (j  =  0;  j  <=  SIZE;  j++) 

{ 

x2(j]  =  y2[j]  =  0; 

for  (k  =  0;  k  <=  SIZE;  k++) 

{ 

xl  =  (2.0  *  PI)  /  240; 
yl  =  k  *  j; 

x2(j]  =  x2(j]  +  (x[kj  *  cos  (yl  *  xl)); 
y2(j]  =  y2(j]  -  (x[k)  *  sin  (yl  *  xl)); 
y2(j]  =  y2[j]  +  (y [k]  *  cos  (yl  *  xl)); 
x2(j]  =  x2[j]  +  (y [k]  *  sin  (yl  »  xl)); 

} 


♦  a.********************************************************************** 

Compute  the  differences  between  the  standard  and  direct  DFT 
results  (noise  components;  realnoise  and  imagnoise).  Compute 
the  SNR  for  real  and  imaginary  results  by  dividing  the  sum  of 
the  standard  results  (signal  components;  realsig  and  imagsig) 
by  the  sum  of  the  noise  components.  Send  the  real  and  imagi- 

*  nary  SNRs  to  a  tile  containing  SNRs  for  all  inputs;  send  the 

*  differences  to  a  file  for  storage. 


***************** ******************************************************* 

gp  =  fopen  (outfname.  ”w”); 

realsig  =  0; 

imagsig  =  0: 

realnoise  =  0; 

imagnoise  =  0; 

winorsig  =  winoisig  =  0; 

for  (j  =  0;  j  <  =  SIZE;  j-»-+) 

{ 

diffx  =  x  2 1  j  j  -  alljj; 
diffy  =  y2!jj  -  b  1  (j j ; 


E 


winorsig  =  winorsig  4-  (al(j]  *  al[j]); 
winoisig  =  winoisig  +  (blfj)  *  bl[j)); 
realsig  =  realsig  4-  (x2(j]  *  x2[j]); 
imagsig  =  imagsig  +  (y2[j]  *  y2[j]); 
reaJnoise  =  reainoise  4-  (diffx  *  diffx); 
imagnoise  =  imagnoise  4-  (difify  *  diffy); 
dbx  =  138.0; 
dby  =  138.0; 
if  (diffx  !=  0.0) 

dbx  =  -10.0  *  loglO  ((diffx  *  diffx)  /  (x2[jj  *  x2[j])); 
if  (diffy  !==  0.0) 

dby  =  -10.0  *  loglO  ((diffy  *  diffy)  /  (y2[j]  *  y2[j])); 
fprintf  (gp,”%d  %20.10f%20.10f0,  j,  dbx,  dby); 

} 

printf  (”  Finished  transferring  output  to  %s0,  outfname); 
f close  (gp); 

snrsr  =  10.0  *  loglO  (winorsig); 
snrsi  =  10.0  *  loglO  (winoisig); 
snrxl  =  10.0  *  loglO  (realsig); 
snrx2  =  10.0  *  loglO  (reainoise); 
snryl  =  10.0  *  loglO  (imagsig); 
snry2  =  10.0  *  loglO  (imagnoise); 
snrx  =  snrxl  -  snrx2; 
snry  =  snryl  -  snry2; 

printf  ("snrsr  =  %20.10f  snrsi  =  %20.10f0,  snrsr,  snrsi); 

printf  ("realsig  =  %20.10f  imagsig  =  %20.10f0.  snrxl,  snryl); 

printf  ("reainoise  =  %20.10f",  snrx2); 

printf  (”  imagnoise  =  %20.10f0,  snry2); 

fpnntf  (hp,”%d  %20.10f%20.10f0,  n,  snrx.  snry); 

}  /*  end  n  loop  */ 

*  end  main  */ 


Appendix  C 

Simulation  Result  Listings 


The  following  listings  are  the  signal-to-noise  ratios  (SNRs)  computed  by  the  pro¬ 
grams  given  in  Appendix  B.  The  first  set  of  listings  are  for  the  standard  Winograd 
module  compared  with  the  direct  DFT.  The  results  are  for  blocklengths  of  15,  16,  17, 
240,  255,  and  272.  The  second  set  of  listings  are  for  the  standard  Winograd  module 
compared  with  the  simulation.  The  results  are  for  the  blocklengths  of  15,  16,  and  17. 


Standard  vs.  Direct  DFT,  15- Point 


270.3267211914 

yjO)  = 

271.8259582520 

270.1993713379 

y[i]  = 

270.9765930176 

270.8795166016 

y(2j  = 

274.7530517578 

266.9262084961 

y[3]  = 

270.4941711426 

272.4660034180 

y[4]  = 

268.6256713867 

270.1086120605 

y(5j  = 

269.2436523438 

270.3933715820 

y(6j  = 

273.6777343750 

288.2878112793 

y(7]  = 

272.1769409180 

269.2735595703 

y(s|  = 

267.9232482910 

272.6429443359 

y[9)  = 

268.7232666016 

274.5322570801 

y(iol  = 

266.1694641113 

265.2796325684 

y(nl  = 

273.1672058105 

266.3085327148 

y[i2|  - 

272.1916503906 

270.5764160156 

y[l3]  = 

268.4227294922 

267.3804626465 

y(14j  == 

267.9915771484 

272.3644104004 

y (is)  = 

268.3309936523 

270.6399230957 

y(ifi]  = 

271.3390502930 

272.6228027344 

y[l7j  = 

268.4317626953 

269.4873046875 

y[i8]  = 

271.5711669922 

271.9138793945 

y[i9]  = 

270.1387939453 

268.1859130859 

y(20|  = 

273.8852233887 

267.5024414063 

y[2i]  - 

272.5805969238 

269.6394958496 

y(22]  = 

268.3592529297 

269.9055786133 

y[23}  = 

264.6501464844 

265.3494873047 

y[24|  = 

270.0167541504 

267.4769592285 

y(25l  = 

273.1098937988 

269.3949279785 

y(26j  = 

273.7298278809 

271.2135009766 

y  (27 1  = 

269.0087585449 

271.2580871582 

y  [28}  = 

266.7854003906 

273.2968750000 

y[29l  == 

269.7937011719 

269.2456359863 

y(30l  = 

271.5105590820 

269.8008728027 

y  [31  i  = 

268.7413940430 

268.5186462402 

y  [321  = 

268,1859436035 

270.1049804688 

y  [331  = 

270,4666748047 

271.1744079590 

II 

272.6804504395 

269.0790100098 

v35l  = 

273.7297058105 

265.5129394531 

y  361  = 

273.4932250977 

269.5553283691 

y  ■  37 1  = 

266  9835510254 

270.5327758789 

y ,38!  = 

267  7444152832 

270  8727416992 

v  1 39i  = 

269.7133483887 

Standard  vs.  Direct  DFT,  15-Point 


xi40'  = 

273.2836303711 

y(40j  = 

271.7077026367 

X 14 1 1  = 

271.2531738281 

y[4lj  = 

268.3661193848 

xf42j  = 

268.0216979980 

y  [42]  = 

270.5642089844 

Xl43]  = 

266.8464050293 

y[43]  = 

268.9776000977 

x[44]  = 

270.1110839844 

y  [44]  = 

268.9873352051 

X 

rfk 

Cn 

II 

269.3290710449 

y(45]  = 

269.8417358398 

x(46]  = 

269.3952026367 

y  [46]  = 

272.2424621582 

X 

»*• 

II 

270.2401123047 

y[47]  = 

266.4663391113 

x|48]  = 

268.9219970703 

y[48]  = 

270.0077209473 

x[49]  = 

267.0398864748 

y  [49]  = 

269.5416564941 

xj50|  = 

271.3050537109 

y[50]  = 

267.8937683105 

X[5lj  = 

269.7127380371 

y  [si]  = 

270.4773559570 

x[52j  = 

271.4542846680 

y  [52]  - 

271.5123901367 

x(53j  = 

270.1429748535 

y[53j  = 

271.8498535156 

x(54j  = 

270.7604370117 

y  [54]  = 

266.8149108887 

x|55]  = 

270.9530029297 

y  [55]  = 

272.4279479980 

x[56l  = 

266.4183654785 

y[56]  = 

274.4346618652 

x[57l  = 

271.4386291504 

y  [57]  = 

266.8522949219 

II 

oo' 

in 

X 

272.0279235840 

y[58|  = 

270.6676635742 

x[59i  = 

270.7712097168 

y[59]  = 

268.7197875977 

xi  60]  = 

273.1189880371 

y[60]  - 

262.9579467773 

x  l’6 1  j  — 

271.7225646973 

y [6i]  - 

265.9170227051 

x[62l  = 

270.9621887207 

y(62)  = 

270.9728698730 

x(63i  = 

267.4638671875 

y[63]  = 

269.7576599121 

xt64j  = 

271.7090454102 

y  [64]  = 

266.3128662109 

x  65i  = 

270.7944641113 

y  [65]  = 

266.5230712891 

x  66  i  = 

270.7575988770 

y  [66]  = 

271.5975341797 

xi67|  = 

267.7533264160 

y  [67]  = 

271.5601806641 

x[68j  = 

271.1271667480 

y  [681  = 

268.4807739258 

x:69>  = 

268.4465637207 

y[69l  = 

273.1932067871 

X:'70|  = 

273.8880615234 

y  [70]  = 

269  1509399414 

x  71  = 

268.6777954102 

y  |71 1  = 

267.7762451172 

x  72  = 

271.8719787598 

vi  72!  - 

267.9895019531 

x  73i  = 

272.1277770996 

y !  73i  = 

267.4620971680 

x:74i  = 

273.5524902344 

y;74i  - 

268.9857177734 

x  75  = 

268.7622680664 

v  :75i  = 

267.8697509766 

x  76  = 

270.1966552734 

y  76  = 

270  9008178711 

x  77  — 

270.6634216309 

y  77’  = 

270  4273071289 

x  78  = 

269  5248413086 

y  78  = 

271  5830078125 

x  79  = 

265  8132934570 

y  79  - 

271  5101623535 

184 


Standard  vs.  Direct  DFT,  15-Point 


X 

o 

II 

273.3381042480 

y[80! 

xf81|  = 

267.3371276855 

y(8i] 

x(82j  = 

266.9151000977 

y[82| 

x;83j  = 

271.6332702637 

y[83] 

II 

00 

X 

268.2593078613 

y  (84] 

xj85]  = 

268.4362487793 

y  [85] 

x(86]  = 

267.9328002930 

y  [se] 

*187]  = 

271.2909545898 

y(87] 

X 

00 

00 

II 

269.0054321289 

y  [88] 

x(89)  = 

270.1526794434 

y  [89] 

X 

© 

II 

263.8572082520 

y[90! 

x|’9lj  = 

265.2599182129 

y  [9i] 

x[92j  = 

271.0217285156 

y  [92] 

*193]  = 

267.6752014160 

y  [93] 

xi94j  = 

265.1853637695 

y(94l 

xi95j  = 

269.4737548828 

y(95] 

xt96j  = 

270.2192687988 

y[96} 

x  1 97 '  = 

268.3923339844 

y  [97] 

xi  98j  = 

273.4592895508 

y  [98] 

xi99l  = 

267.4300231934 

y  [99] 

Real 

Mean  269.7486669922 

Std.  Dev.  2.2155283971 

Minimum 
Maximum 


267.4616394043 

272.3781738281 

271.9751586914 

270.5668640137 

268.3897399902 

269.7589111328 

271.5138549805 

272.1745605469 

272.1026611328 

268.9882812500 

274.7151794434 

271.5104980469 

266.5699462891 

274.0068969727 

271.4517822266 

268.9801635742 

266.2698364258 

268.8155822754 

269.0731201172 

271.1868896484 

Imag 

269.9851544189 

2.3963705533 


263.8572082520 

274.5322570801 


262.9579467773 

274.7530517578 


Standard  vs.  Direct  DFT,  18-Point 


272.7552795410 

yfoj  = 

268.9796142578 

272.1343078613 

y[i]  = 

270.4200439453 

272.9894104004 

y(2|  = 

267.0741882324 

269.5061035156 

y[3]  = 

269.7828369141 

271.7872009277 

y[4]  = 

269.5117492676 

268.9995422363 

y[5j  = 

267.9895629883 

269.0440979004 

y[6]  = 

271.7550048828 

272.5062866211 

y[7!  = 

270.9828491211 

271.4386291504 

y[8]  = 

265.9161682129 

268.2858278367 

y[9j  = 

268.1496582031 

269.6635437012 

y[io]  = 

269.8590087891 

269.8771972656 

y(lll  = 

269.0490112305 

272.2474060059 

y(i2]  = 

265.7119445801 

269.8631286621 

y[i3]  = 

268.6048583984 

273.3859863281 

y  [  i4]  = 

270.4118957520 

269.6500244141 

yfi5|  = 

270.3233337402 

269.4855346680 

y[i6]  = 

268.4541625977 

266.8340759277 

y[i7]  = 

269.9486694336 

268.6825256348 

y  [is)  = 

272.4471130371 

265.6889953813 

y[i9]  = 

268.9368591309 

268.6466369629 

y  [20]  = 

271.1831054688 

271.3880004883 

y  (21  j  = 

267.9326171875 

269.5439453125 

y  (22)  = 

270.4999389648 

264.3802795410 

y  [23]  = 

273.8593750000 

271.4773559570 

y[24|  = 

271.7287597656 

268.1225280762 

y  [25]  = 

270.9976501465 

265.7618713379 

y  [26]  = 

273.3869323730 

268.4483642578 

y  [271  = 

273.7132568359 

269.3742065430 

y  [28j  = 

273.4106140137 

269.9822692871 

y(29i  = 

270.0431823730 

267.9257202148 

y[30l  = 

268.2225036621 

266.8103637695 

y  ;31 !  = 

270.1790771484 

266  4388122559 

y  1 32 1  = 

269.7553710938 

266.2378540039 

y  [331  = 

274.9422512598 

271.0850219727 

y[34i  = 

270.1690673828 

272.6909790039 

yj35i  = 

270.7511901855 

268.1923706055 

271.4857482010 

272.9253845215 

268.4839477539 


270.8391723633 
271.5530395508 
270  3072204590 
269  5484313965 


Standard  ve.  Direct  DFT,  10-Point 


* 


I 


XI 40}  = 

268.3330383301 

y[40]  = 

269.7852783203 

x(41]  = 

269.1254272461 

y[4i]  = 

269.3233947754 

xl42]  = 

268.8452148438 

y  [42]  = 

268.4582214355 

xi43]  = 

273.3180705566 

y  [43]  = 

270.5591125488 

x[44]  = 

268.9049377441 

y  [44]  = 

266.7042541504 

x[45)  = 

271.9574890137 

y(45]  = 

268.7884826660 

x[46]  = 

260.9253234863 

y  [46]  = 

277.2808227539 

x{47]  = 

269.6184692383 

y  [47]  = 

273.6904296875 

x[48]  = 

270.7478637695 

y[48j  = 

274.3979187012 

x[49)  = 

271.6245727539 

y  [49]  = 

271.3009643555 

x[50j  = 

269.8124389648 

y(50]  = 

271.8434143066 

x[5l)  = 

272.3414001465 

y[5i]  = 

273.0943298340 

x(52]  = 

272.1879882813 

y  [52]  = 

269.4304199219 

x(53)  = 

265.5771789551 

y  [53]  = 

272.2980651855 

xj54]  = 

268.9840393066 

y  [54]  = 

271.2325439453 

x[55]  — 

271.3327636719 

y[55]  = 

269.8065490723 

x{56l  = 

268.9931335449 

y[56|  = 

270.9717407227 

x  [57]  = 

269.2901000977 

y  [57]  = 

268.7641601563 

x[58]  = 

269.7671203613 

y[58]  = 

271.5994873047 

x[59j  = 

267.4548950195 

y  [59]  = 

271.3090209961 

xi  601  - 

270.0095214844 

y  [60]  = 

268.9137268066 

xi'611  = 

275.0334472656 

y(«i]  = 

266.4119567871 

x[62]  = 

273.6075744629 

y  [62]  - 

270.5734863281 

xi63i  = 

270.5801696777 

y  [63]  = 

268.4373779297 

xi64j  = 

264.6386108398 

y  [64]  = 

273.6230773926 

x.65 1  = 

273.3971557617 

y  [65]  = 

268.9710693359 

x  661  = 

274.4778137207 

y  [66  j  = 

266.6929626465 

x.’ 671  = 

270.4531860352 

y  [67]  = 

272.0224609375 

x  681  = 

268.2864990234 

y  [681  = 

271.1570739746 

x69l 

271.3478088379 

y  [69]  = 

270.3151550293 

x  ;o  = 

269.1685485840 

o 

II 

268.4387512207 

x  71  = 

266.7461242676 

y  ( 7 1  ]  = 

273.5340270996 

x  72  = 

272.3862915039 

y  1 72!  = 

269.6380920410 

x  73:  = 

273.0481262207 

yi73i 

265.4359741211 

x .  7  4  i  = 

268.3798828125 

y  71  = 

270.9328918457 

x  75.  = 

269.9814453125 

II 

40 

s 

>* 

268.4600219727 

x  76 1  = 

271.0657043457 

y i76l  = 

269  3445739746 

x  77’  = 

269  5472106934 

y  ■  77 '  - 

269.7560119629 

x  78  = 

272.6813964844 

y  78  j  = 

269  8146057129 

x  79  ■= 

269  6702575684 

y  791  = 

271  4771423340 

Standard  vs.  Direct  DFT,  18-Point 


x  1 801  = 

275.4037780762 

y|80]  = 

266.9851684570 

xi8ll  = 

271.6023559570 

y[si!  = 

268.7348327637 

xi82{  = 

270.4073486328 

y(82]  = 

268.1242065430 

xi83]  = 

271.3636474609 

y  [83]  = 

266.9815673828 

x[84]  = 

271.1292419434 

y[84j  = 

269.1577758789 

x(85|  = 

271.0298158738 

y  [85]  = 

267.4655456543 

II 

00 

X 

271.7021179199 

y  [sc]  = 

269.4682617188 

x[87]  = 

271.2106933594 

y  [87]  = 

269.5105590820 

x[88]  = 

271.1216430664 

y  [88]  = 

268.2117004395 

x  (89 1  = 

270.6502685547 

y  [89]  = 

269.2516174316 

II 

I 

X 

268.8835449219 

y  [60]  = 

267.9707946777 

xi91j  = 

271.6461181641 

y[«]  = 

260.9583740234 

x[92]  = 

268.6901855469 

y  [92]  = 

273.1594543457 

x(93j  = 

269.4382324219 

y  [93}  = 

268.0898742676 

xi  94]  = 

270.8758544922 

y  [94]  = 

266.6210632324 

xi95]  = 

272.0344543457 

y[95]  = 

266.7901000977 

xi  96  ]  = 

270.6707763672 

y[96)  = 

269.9903564453 

xj97)  = 

267.9556884766 

y  [97]  = 

268.6535034180 

xi981  = 

269.1480712891 

y  [98]  = 

271.4492492676 

xl  991  = 

267.2027587891 

y  [99]  = 

269.9615783691 

Real 

Imag 

Mean 

270.0033959961 

269.8945596313 

Std.  Dev. 

2.3937631598 

2.3372463719 

Minimum 

260.9253234863 

260.9583740234 

Maximum 

275.4037780762 

277.2808227539 

Standard  vs.  Direct  DFT,  17-Point 


xiOj  = 

269.8819885254 

XI  lj  = 

269.4014282227 

x[2j  = 

272.0111999512 

x|3]  = 

268.9029846191 

x(4]  = 

268.4132995605 

x[5]  = 

269.8236389160 

*[•!  = 

273.5191955566 

x[7|  = 

269.6670532227 

x[8]  = 

270.8864440918 

x|9]  = 

270.0984497070 

xflOl  = 

270.8583374023 

xfllj  = 

270.8510742188 

Xjl2]  = 

270.6906127930 

x[13)  = 

272.8601684570 

x[14)  = 

273.4621276855 

x(15]  = 

268.5950317383 

x[16j  = 

271.8672485352 

xf  171  = 

271.0851135254 

x  1 1 8 1  = 

266.5239257813 

xi  191  = 

271.5162048340 

x[20i  = 

268.3395690918 

x(21)  = 

267.7672424316 

xi  221  = 

268.6538391113 

xi23j  = 

269.3810424805 

xi24l  = 

270.8980407715 

xi  25i  = 

270.3724365234 

xi'26l  = 

269.9035949707 

x  1 27 1  = 

269.1344909668 

xi'28j  = 

265.9377746582 

xi  291  = 

271.7009887695 

x30i  = 

270.4732971191 

x  3 1 1  = 

272.6469726563 

x.  32!  = 

268.8228149414 

x!  331  = 

273.7104492188 

x  34i  = 

269.1943054199 

x .  35 1  = 

271.5632019043 

x  36:  = 

269.4310913086 

x  37!  = 

268.1188354492 

X  38:  = 

269  1134338379 

x  391  = 

268  6231079102 

y[0j  = 

270.7600402832 

y[i]  = 

272.1129150391 

y[2)  = 

272.0293884277 

y(3]  = 

271.9183654785 

y[4j  = 

268.0768432617 

y[5]  = 

268.0231933594 

y  [6]  = 

266.7305603027 

y[7]  = 

269.5509643555 

y[8]  = 

269.4957885742 

y(9]  = 

269.8625183105 

y(10]  = 

270.9628601074 

y(lll  = 

268.5837097168 

y[12)  = 

268.6798706055 

y[13]  = 

273.6946105957 

y(14|  = 

266.7900085449 

y  [is]  = 

269.5240173340 

y  [16]  = 

264.9834899902 

y[i7]  = 

269.1212768555 

y  [18]  = 

274.0615844727 

y[w]  = 

270.4394531250 

y  [20]  = 

268.6579895020 

y[2il  - 

269.1622924805 

y  [22)  = 

270.5011596680 

y  [23]  = 

271.2490844727 

y  [24]  = 

267.3539123535 

y  [25]  = 

268.2624511719 

y[26)  = 

269.6186218262 

y  [27]  = 

267.0343017578 

y  [28]  - 

270.1463012695 

II 

cT 

Cl 

267.6523132324 

y  1 30]  = 

267.3661193848 

y  ;3l!  = 

272.1127929688 

y  32]  = 

269  0408935547 

II 

CC' 

273.3024291992 

yi34i  = 

269.5977478027 

y  i  35i  = 

273.6450604004 

yi36i  = 

269  6079101563 

y  [37]  = 

267  0149218750 

y  381  = 

267  6660461426 

v  39'  = 

272.7576904297 

Standard  vs.  Direct  DFT,  17-Point 


x  1 401  = 

271.0748596191 

xi4lj  = 

265.2482604980 

x[42]  = 

265.4088439941 

x[43)  = 

269.1730346680 

X[44j  = 

272.2053222656 

x(45j  = 

267.9991455078 

x[46)  = 

270.1383361816 

JX 

II 

271.5760498047 

x|48j  = 

270.5822143555 

x[49j  = 

273.2935485840 

x[50j  = 

269.3102416992 

x(5lj  = 

270.2906494141 

x|52j  = 

268.3794555684 

x[53j  = 

267.1316528320 

xi54j  = 

270.3419189453 

xi55]  = 

273.1160583496 

xi  56]  = 

271.4096984863 

xi57l  = 

268.4126892090 

x[58l  = 

265.3532409688 

x:59!  = 

269.4324645996 

x  1 60  i  = 

269.1028747559 

x:61i  = 

272.4880065918 

x:62!  = 

271.4067993164 

Xi63]  = 

271.1753845215 

x  64  = 

270.3529968262 

xi65l  = 

272.8681335449 

x66j  = 

269  1588439941 

x.67l  = 

271.3894653320 

x  .681  = 

270.2159729004 

x,69l  = 

266.4914855957 

x  70  = 

267.1986083984 

x  71  = 

266.6242675781 

x  72:  = 

271.1471557617 

x  73  = 

268.0986633301 

x  74 i  = 

268.4238891602 

x  75  = 

269  7712707520 

x  76'  = 

267  1796264648 

x  77  = 

271.7876892090 

x  78  = 

27 1 .5567626953 

x  79  = 

270.0137939453 

y  [40]  = 

264.4887695313 

y[4l!  = 

270.3030700684 

y[42]  = 

270.5873718262 

y(43]  - 

264.2818603516 

y{44)  = 

267.3887329102 

y[45j  = 

274.9462280273 

y  [46]  = 

274.0894775391 

y  [47]  = 

270.5509338379 

y[48]  = 

271.3222656250 

y|49l  = 

268.9748535156 

y  [50]  = 

268.2997741699 

y  [51]  = 

271.1800842285 

y[52]  = 

269.4108886719 

y  [53]  = 

269.9113769531 

y[54]  = 

270  9147949219 

y  [55]  = 

265.6920471191 

y  [56]  = 

271.8542175293 

II 

in 

273.2379760742 

y[58|  = 

274.1091918945 

y[59]  = 

270.6475219727 

y  [60]  = 

269.0531616211 

y[ei]  = 

269.4778442383 

y|62]  = 

272.0108032227 

y  [63]  = 

271.3197631836 

y  [64]  = 

269.5959472656 

y[65]  = 

267.4597167969 

y  [66]  = 

265.5984497070 

y  [67]  = 

267.6537170410 

y  [68]  = 

267.0043334961 

yi69l 

271.4144897461 

y7  0  = 

270.6966247559 

y ;  7 1  i  = 

268.6539916992 

y '72! 

268.8063354492 

y  73  = 

272.5935058594 

y  [74]  = 

268.6405029297 

y  [751  = 

269.3260803223 

y  i  7  6 1  = 

266  6194763184 

v  77'  = 

265  1469116211 

v  78  = 

269.6063537598 

v  79i  -- 

269  1286315918 
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Standard  vs.  Direct  DFT,  17-Point 


Xl80j  = 

265.9388122559 

y  [so] 

= 

270.9612121582 

XISll  = 

268.4763488770 

y[8ii 

= 

273.0566406250 

x|82l  = 

268.9838867188 

y(82! 

= 

274.3491821289 

x[83j  = 

269.2582397461 

y  [83] 

= 

269.8005065918 

x(84j  == 

269.4443664551 

y[84] 

= 

273.0825805664 

x[85j  = 

270.5592346191 

y[85! 

= 

261.4719238281 

x(86]  = 

268.5687866211 

y[86] 

= 

267.5593872070 

x[87]  = 

272.0215148926 

y[87] 

= 

270.6485290527 

x[88]  = 

272.9148864746 

y[88j 

= 

262.5549926758 

x[89l  = 

267.7056884766 

y  [89] 

= 

269.8771667480 

x  1  SOI  = 

272.4746093750 

y  [90] 

= 

267.7489624023 

xi[9l!  = 

273.3935546875 

y  [9i] 

= 

273.5779113770 

xi’92j  = 

266.5085144043 

y  [92] 

= 

273.7836914063 

x[93)  = 

264.3068847656 

y[93] 

= 

272.3791809082 

xi  94]  = 

273.5789794922 

y  [94] 

= 

267.7886047363 

x]95i  = 

271.1185302734 

y[95] 

= 

268.3768310547 

II 

05 

X 

263.3368225098 

y  [96] 

271.6583862305 

XI 97]  = 

270.5970153809 

y  [97] 

— 

269.5770874023 

xi  98]  = 

266.7500610352 

y  [98] 

= 

274.2363281250 

xi  99]  = 

270.9354248047 

y(99| 

= 

270.1430358887 

Real 

Imag 

Mean 

269.7790359497 

269.7181231689 

Std.  Dev. 

2.2153467436 

2.6401404206 

Minimum 

263.3368225098 

261.4719238281 

Maximum 

273.7104492188 

274.9462280273 

Standard  vs.  Direct  DFT,  240-Point 


►  *3 


i  * 


x:Oi  = 

176.9281005859 

y]ol  = 

176.0798339844 

xilj  = 

175.8159942627 

y[i!  = 

176.5415496826 

xi2]  = 

175.2160034180 

y[2]  = 

175.1606445313 

xj3]  = 

175.4710540771 

y[3]  = 

176.7188262939 

Xl4l  = 

176.0864715576 

y  [4]  = 

175.0678100586 

xi>!  = 

175.6123352051 

y  (5]  = 

175.8395233154 

x(6]  = 

175.0600128174 

y[6]  = 

176.5708312988 

x[7]  = 

176.3147125244 

y[7j  = 

176.7072753906 

x[8]  = 

175.8805694580 

y[s]  = 

175.4317932129 

x[9]  = 

175.9056701660 

y|9]  = 

175.6567382813 

xi  10]  = 

175.5078125000 

y  l  io]  = 

176.0266265869 

xill]  = 

176.0314483643 

y[Hj  = 

176.3258056641 

x[12j  = 

176.7476043701 

y[i2)  = 

176.8772735596 

x[13j  = 

175.7143096924 

y(i3]  = 

175.5060729980 

xi  14]  = 

175.2527618408 

y  [14]  = 

175.3540496826 

xi  15j  = 

175.5321197510 

y  (is]  = 

175.8341979980 

xil6l  = 

175.7600402832 

y(i6]  = 

176.0635375977 

xi  17]  = 

176.2900390625 

y[17]  = 

176.5039672852 

xi  181  = 

175.4552764893 

y[i8]  = 

177.0667724609 

x;  19)  = 

176.2388153076 

y  [19]  = 

176.0251159668 

x  201  = 

175.3146209717 

y  [20]  = 

177.0319824219 

xi2li  = 

175.7579803467 

y  [2i]  = 

175.4255218506 

x  22)  = 

175.3465728760 

y  [22]  = 

176.9929962158 

x  23  = 

177.0458984375 

y  [23]  = 

176.3945007324 

x  24  = 

176  5691680908 

y  [24]  = 

174.9557189941 

x  25  = 

175.6906127930 

y  i.25i  = 

176.5028533936 

x  28  = 

174.4704895020 

y  1 26 1  = 

176.0759582520 

x  27  = 

176  4285430908 

yi 271  = 

175.3894348145 

x  ’  28 1  = 

176  7745819092 

y  ;28]  = 

175.2816925049 

x  291  = 

175.0116577148 

v  291  = 

175.5447692871 

x  30!  = 

176.6018676758 

y  30!  = 

175.9234008789 

\  31  = 

175.1762542725 

v  311  = 

175.9215393066 

x  32  = 

176.3634948730 

y  •32!  = 

175.5064239502 

x  33  = 

175.9099731445 

y  331  = 

176.2240753174 

x  34  = 

176  0561370850 

y  >34]  = 

175.6609039307 

x  35  = 

176.2507324219 

y  .35  i  = 

175.1939849854 

x  38  — 

176  3163452148 

y  i  36  i  = 

175.3961791992 

x  37  = 

175.6886749268 

y  37'  = 

176  1816101074 

x  38  = 

175.1196289063 

V;38!  = 

176  7167358398 

x  39  = 

176.7915344238 

v  39 1  = 

177.3120269775 

[>>>; 

v  a  •  •  ^  • 
•  a  •»*  i 


•'  «'*, 
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N  V 


Standard  va.  Direct  DFT,  240-Point 


xi-iOi  = 

175.6382751465 

y[40l 

= 

176.2698364258 

Xl41l  = 

176.2331542969 

y{4i! 

= 

176.1804504395 

XI 42;  = 

176.0434112549 

y  (421 

= 

175.7587432861 

x|43]  — 

175.8974609375 

y  (43J 

= 

176.3019104004 

II 

174.9062652588 

y  (44) 

= 

176.3957519531 

X 

W 

Cn 

II 

175.9091339111 

y[45) 

= 

175.9953918457 

X 

II 

176.9732666016 

y  (46) 

= 

175.7868957520 

x[47]  = 

176.2299804688 

y(47) 

= 

176.1300811768 

x{48]  == 

176.7227020264 

y  (48) 

= 

174.9942169189 

X 

.5® 

ii 

175.7549285889 

y  (49) 

= 

174.9685363770 

Real 

Imag 

Mean 

175.9162899780 

175.9954473877 

Std.  Dev. 

0.5917 

510921 

0.6040244160 

Minimum 

174.4704895020 

174.9557189941 

Maximum 

177.0458984375 

177.3120269775 

Standard  vs.  Direct  DFT,  255-Point 


xiOi  — 

176.9676361084 

y[o)  = 

175.2451019287 

x  >  1  ]  = 

174.5153198242 

y[ij  = 

176.2055664063 

x(2j  = 

174.5861968994 

y(2!  = 

174.6942901611 

x;3i  = 

175.7495880127 

y[3]  = 

175.2100830078 

x(4j  = 

175.2971343994 

y[4]  = 

175.2332305908 

xj5]  = 

174.5655364990 

y[s]  - 

174.9668121338 

x  1 61  = 

176.4736938477 

y  [6|  = 

175.4824066162 

xt’7  i  = 

175.0034332275 

y[7!  = 

175.5998382568 

xi8i  = 

176.3172302246 

y(8]  - 

174.4532775879 

x  9  = 

175.8294982910 

y{9|  = 

175.4020233154 

\  10  = 

175.3798675537 

y[io!  = 

175.9258575439 

x  11  = 

175.6436309814 

y[lll  = 

176.3606414795 

X' 121  = 

175.7827758789 

y  [12]  = 

175.9241943359 

xil3i  = 

175.2949218750 

y  (i3)  = 

175.1103210449 

Xi  14!  - 

174.2272186279 

y[l4j  = 

175.7443389893 

x:  15i  = 

175.2727966309 

yfis]  = 

175.5301971436 

x  161 

175.9711914063 

y[i6i  = 

175.7611999512 

x  1 7  = 

174.3645629883 

y  (17;  = 

177.0066986084 

x  18  = 

175.1951751709 

yil8j  = 

176.0203552246 

x  19:  = 

175.7872314453 

y  f  19l  = 

176.1752777100 

x  20:  = 

174.9986877441 

y  (201  = 

176.0463256836 

x  21  = 

174.9064483643 

y  (21!  = 

176.6525115967 

x  22!  = 

175.4567718506 

y(22]  = 

174.6132049561 

x  23  = 

175.2766571045 

y  (23)  - 

176.2723083496 

x  24  = 

175.6004486084 

y(24i  = 

174.9115295410 

x  25  = 

175.7740478516 

y  (25]  = 

176.3608856201 

x  26  = 

176  3112030029 

y  26!  = 

174.2060699463 

x  27  = 

174.8421936035 

y  (27'  = 

175.5248870850 

x  28  = 

176.3071899414 

y28'  = 

175.9502110889 

x  29  = 

174.1696319580 

y  291  = 

176.0451202393 

x  30  = 

175.4562683105 

v  30  = 

175.7846221924 

x  31  = 

175.9324798584 

y  31  = 

175  1319427490 

x  32  = 

175.7597045898 

y  32! 

175.0276641846 

\  33  = 

176.0273742676 

V  33  ;  = 

175.1878051758 

x  34 

174.3921966553 

y  34'  = 

176  3043670654 

x  35  -= 

175.7931365967 

v  35  = 

175  1776916504 

x  36  = 

175.6822814941 

y  36  = 

176.1872253418 

x  37  = 

175.6284790039 

v  37'  = 

175.6081848145 

x  38  = 

175  1247863770 

v  38'  = 

175.9377441406 

x  39  - 

176  3980560303 

y  39'  = 

174  8244781494 
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Standard  vs.  Direct  DFT,  255-Point 


x;40]  = 

175.2262725830 

y  [40] 

= 

174.9261322021 

X 1 4 1 '!  = 

175.0415039063 

y[4i] 

= 

176.4279937744 

x[42l  = 

176.1199493408 

y[42] 

= 

175.8008728027 

xf43l  = 

175.5458526611 

y  [43] 

= 

175.7980651855 

x[44)  = 

175.7533569336 

y  [44] 

= 

176.4787597656 

X 

II 

175.9052429199 

y  [45] 

= 

175.2508544922 

x[46]  = 

175.4692535400 

y  [46] 

= 

174.8632507324 

X 

II 

175.8170623779 

y  [47] 

= 

175.9199829102 

x[48]  = 

175.5901489258 

y  [48] 

= 

176.1115264893 

II 

x" 

175.8327026367 

y  [49] 

= 

174.9668426514 

Real 

Imag 

Mean 

175.4872805786 

175.6130160522 

Std.  Dev. 

0.6166290744 

0.6181964662 

Minimum 

174.1696319580 

174.2060699463 

Maximum 

176.9676361084 

177.0066986084 

Standard  vs.  Direct  DFT,  272-Point 


xiOl  = 

175.3172760010 

y  [o]  — 

175.6668548584 

xil]  = 

175.2134399414 

y[i]  = 

175.1342163086 

x[2j  = 

174.6766662598 

y[2]  = 

175.0067749023 

xi3j  = 

175.8343658447 

y[3)  = 

174.6544647217 

.  “*«* 

x(4)  = 

174.6794281006 

y[4)  = 

175.0544738770 

*  *' «  **. 

xi5]  — 

174.7781066895 

y(5|  = 

174.3569335938 

x[6]  = 

175.7173919678 

y(e)  = 

175.3242187500 

x[7]  = 

174.7194671631 

y[7]  = 

174.3413238525 

x(8]  = 

174.6242370605 

y[s)  = 

174.6977691650 

*:-V: 

xi9|  = 

174.6361236572 

y[9j  = 

174.6449890137 

^ 

•  a.  v  ^ 

»  «  *  U 
>  '  »  ' 

xl  101  = 

174.8958282471 

y[io|  = 

176.0199432373 

x  [  1 1  ]  = 

175.3851013184 

y  [nl  = 

175.5915069580 

xil2j  = 

175.1615142822 

y(i2]  = 

174.6331939697 

% 

x{13]  - 

173.5044860840 

y  [13}  = 

175.2667846680 

xfl4j  = 

175.7932128906 

y(l4|  = 

173.9664916992 

x  1 1 5  ]  = 

175.0761718750 

y  [15}  = 

175.5431213379 

x[  16!  = 

174.8210754395 

y [i6]  = 

174.4515075684 

xl 17)  = 

175.1059112549 

y[i7]  = 

175.4931030273 

xi  18!  ■•= 

175.3726654053 

y  (is}  = 

174.6825256348 

yy 

xi  191  = 

175.6221313477 

y[l9i  = 

174.6659088135 

.  *> . 

x(20l  = 

176.3201446533 

y[20j  = 

174.7219696045 

x  j  2 11  = 

174.5650024414 

y[2i]  = 

174.1995239258 

xj22i  = 

174.6980743408 

y[22|  = 

175.3710479736 

*• 

xi23|  = 

173.9389953613 

y[23j  = 

174.4743041992 

xi24i  = 

174.4600982666 

y[24j  = 

175.2648773193 

*  «’* 

-  »  * 

x:25i  = 

175.1066436768 

y[25j  = 

174.6895141602 

. _  k»  m.: 

x  1 261  = 

174.3740081787 

II 

c» 

V, 

176.3616333008 

•-V-  ■ 

x[27!  = 

174.1670227051 

y  [  27 ;  = 

176.0292053223 

x,28!  = 

175.9723510742 

y  1 28  j  = 

174.9096527100 

xi 291  = 

175.1481628418 

y  (291  = 

174.7274627686 

x  1 301  = 

174.8263854980 

y[30l  = 

174.9681091309 

r— r 

x  3 1  = 

175.6766510010 

y[3i)  = 

175.3684082031 

v.  ■ 

x  321  = 

174.6771087646 

yi32!  = 

175.4441986084 

xi  33!  = 

175.5671997070 

y[33!  = 

174.1645965576 

x  34  = 

176.4279022217 

y  (34)  = 

175.6058502197 

x  351  = 

175.4553070068 

y '35;  = 

175.6769714355 

yy 

x  36i  = 

175.3783569336 

v  36i  = 

174.9795837402 

x  37  = 

175.1912384033 

y  (37!  - 

174.8423309326 

yy 

x.38<  = 

175.2342376709 

II 

00 

174.4134216309 

•* .  ’% 
>  V 

x  391  = 

174.3762969971 

y39  = 

175.0219268799 
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Standard  vs.  Direct  DFT,  272-Point 


X 

o 

II 

175.2068023682 

y  (401 

= 

174.7622070313 

xi  41]  = 

175.3159332275 

y  [4i] 

= 

175.4618225098 

x  1 42]  = 

175.1643066406 

y  [42] 

— 

174.8082733154 

xj43]  = 

175.0156707764 

y  [43] 

= 

174.7454071045 

xj'44(  = 

175.5962068650 

y(44j 

= 

175.1154479980 

x[45j  = 

175.4506225586 

y  [45] 

= 

175.8812713623 

x[46]  = 

174.8144683838 

y  [46] 

= 

174.0914154053 

X 

II 

176.2023773193 

y(47j 

= 

174.7069396973 

X 

00 

II 

175.1609954834 

y  [48] 

= 

174.3604431152 

xi49l  = 

175.0452728271 

y(49] 

= 

174.8707122803 

Real 

Imag 

Mean 

175.1093688965 

174.9846926880 

Std.  Dev. 

0.5806204977 

0.5435851097 

Minimum 

173.5044860840 

173.9664916992 

Maximum 

176.4279022217 

176.3616333008 

197 


Standard  vs.  Simulation,  15-Point 


xiOl  = 

126.9658889771 

yi[ol  = 

128.6595001221 

xill  = 

127.1139373779 

y  [  i]  = 

125.0317230225 

*{2J  = 

124.3147811890 

y[2]  = 

130.9378356934 

x[3)  - 

126.6682891846 

y(3]  = 

123.7885589600 

x(4]  = 

128.8446350098 

y[4]  = 

127.2956924438 

x(5]  = 

125.8834075928 

y[5]  = 

124.4208068848 

x[6]  - 

126.0367584229 

y  [6]  = 

127.0050735474 

x(7]  = 

125.7816925049 

y[7]  = 

126.9526977539 

x  [8]  = 

124.3878250122 

y[s]  = 

125.0147018433 

x(9]  = 

123.5895309448 

y  [9]  = 

125.9368896484 

xf  10]  = 

127.3346557617 

y[io]  = 

125.1703872681 

x[ll]  = 

123.9044952393 

y [ii]  = 

130.2939453125 

*[«]  = 

124.7136917114 

y  [12]  = 

127.7754821777 

*[13]  = 

126.8959732056 

y  [i3]  = 

123.9404067993 

xfl4j  = 

125.4262542725 

y(l4]  = 

125.1703872681 

xi  15]  = 

126.2954330444 

y  [is]  = 

124.6974182129 

xil6]  = 

126.7282409668 

y[ie|  = 

125.8010559082 

xil7j  = 

126.1213912964 

y[l7]  = 

123.6008453369 

x[18j  = 

125.6085433960 

y[i8]  - 

126.4449844360 

xi  191  = 

125.6101455688 

y[i9]  = 

127.8805313110 

xi 201  = 

122.3669967651 

y[20]  - 

128.0266342163 

xi  21]  = 

125.6510925293 

y(2i]  * 

129.1865692139 

xf22)  = 

128.3963470459 

y[2 2]  = 

125.2900390625 

x  1 23]  = 

126.4536743164 

y  [23]  = 

124.4100952148 

II 

Cl 

X 

126.0329818726 

y  [24]  = 

125.9576416016 

x[25i  = 

124.4752349854 

y  [25]  = 

127.6567687988 

xi  26]  = 

123.1487426758 

y  [26]  = 

126.0815734863 

X 

to 

^1 

II 

124.8630447388 

yl27l  = 

125.0927429199 

xl  28]  = 

128.4418182373 

y  (28 1  = 

125.2314453125 

xi29l  = 

127.6404342651 

y[29l  = 

127.5886688232 

X;30i  = 

124.8626632690 

y[30i  = 

128.0912170410 

x :  3 1  ]  = 

125.2672424316 

y?3l]  = 

121.3894500732 

x ;  32]  = 

126.1647338867 

y  (32!  = 

129  5290985107 

x33i  = 

124.9801940918 

y  [33]  = 

126.7948989868 

xi  34!  = 

126.1622467041 

y  [34]  = 

129.0379638672 

xi  35]  = 

125.2439117432 

y[35l  = 

127  1847610474 

x  361  = 

122.9684448242 

y  '36]  = 

129.4932861328 

xi37'  = 

126.7451705933 

y[37]  = 

125.9872894287 

x  1 38]  == 

127.6056671143 

y  ;38]  = 

125.6809234619 

x  39:  = 

128.3607940674 

yi  391  = 

127.5285491943 

Standard  vs.  Simulation,  15-Point 


xi40j  = 

127.6127700806 

xi’41i  = 

128.1640472412 

II 

c*' 

/•r 

'  x‘ 

126.5611190796 

xi43)  = 

126.7883224487 

x[44j  = 

127.5197525024 

x[45]  = 

124.3297882080 

II 

X 

126.3443374634 

X 

s 

ii 

127.4374389648 

x[48j  = 

127.3305206299 

x[49]  = 

124.4188232422 

xf  50}  = 

126.6728057861 

x[5l]  = 

125.3947372437 

xl'52]  = 

125.8040542603 

II 

e^T 

m 

X 

126.8680572510 

x[54]  = 

125.6404037476 

x[55l  = 

124.9454727173 

xi'56]  = 

124.2371215820 

xi57l  = 

125.3613662720 

xi  58  j  = 

126.9989395142 

xi59j  = 

126.6637268066 

xi60l  = 

127.3334579468 

xi6li  = 

127.7255859375 

xi62|  = 

125.7728652954 

x[63l  = 

124.9451065063 

x64!  = 

127.8579711914 

xi65i  = 

126.3562469482 

xi66l  = 

124.9039688110 

xi67|  = 

125.9150085449 

x;68l  = 

124.8955612183 

xi69i  = 

125.8209228516 

xi  701  = 

128.2415161133 

x  71!  = 

126.5546951294 

Xi72:  = 

127.0476074219 

xi  731  = 

127.4869155884 

x  j  7  -4  i  = 

126.9031829834 

xi75j  = 

126.5438232422 

x!  761  = 

125.9665832520 

X 1  77  ;  = 

128.6787719727 

x  1 78 1  = 

125.3847656250 

x:79'  = 

121.6201248169 

yj40]  = 

123.8710327148 

y[4i]  = 

124.5534667969 

y[42]  = 

128.2700653076 

y[43]  = 

128.5454101563 

y(44j  = 

128.5200958252 

y  [45]  = 

126.2699432373 

y  [46]  - 

128.9873199463 

y[47j  = 

125.620437622 1 

y[48]  = 

128.5934906006 

y  [49]  = 

126.6717681885 

y(50)  = 

128.5397720337 

y  [51]  - 

126.8625488281 

y  [52]  = 

127.7437286377 

y[53]  = 

128.9866333008 

y[54j  - 

125.2381591797 

y[55]  = 

126.1306076050 

y  [55]  = 

126.6063919067 

y  [57]  = 

124.4652938843 

y[58)  = 

127.8016235352 

y[59]  = 

127.5234756470 

y  [60]  - 

121.6755905151 

y  [61]  = 

128.1257476807 

y  [62]  = 

123.1785507202 

y  [63]  = 

127.4509277344 

y[64|  = 

120.5933532715 

y  [55]  = 

124.7063980103 

y  [661  — 

129.1341094971 

yj67l  = 

125.2517395020 

y|68i  = 

124.9134750366 

y  [69]  = 

126.7678756714 

y  [70]  = 

126.4128189087 

y  i  7 1  j  = 

127.0057525635 

yi72l  = 

124.6814117432 

yi73i  = 

125.4552459717 

y;74l  = 

125.3369293213 

y  [75i  = 

130.2485656738 

y;76i  = 

126.6366653442 

y ;  77 !  = 

127.0721945068 

y  [78]  — 

127.0151290894 

yi79l  = 

127  9532546997 

Standard  vs.  Simulation,  15-Point 


x  1 80}  = 

125.8070831299 

y  [80]  = 

123.7141952515 

x|8ll  = 

126.6072845459 

y[8l]  = 

127.1912384033 

xi82j  = 

125.6616973877 

y[82]  = 

127.1672515869 

xi'83j  = 

127.5524520874 

y(83]  = 

125.1179733276 

xi84j  = 

124.9084472656 

y[84j  = 

125.3907165527 

x[85!  = 

124.3799057007 

y[85]  = 

124.6561660767 

x[86]  = 

124.3708190918 

y  [86]  = 

124.7729339600 

X 

00 

II 

125.6674728394 

y[87]  = 

127.2672195435 

00 

00 

II 

126.4710006714 

y  [88]  = 

126.4708115723 

x(89j  = 

124.8405914307 

y  [89]  = 

125.7980117798 

II 

o' 

o 

125.8368682861 

y  [90]  = 

127.0739288330 

x[91|  = 

123.6946792603 

y  [9i]  = 

126.4280700684 

xj92]  = 

125.9281616211 

y  [92]  = 

123.9990463257 

x|[93l  = 

124.6118316650 

y  [93]  = 

128.0190582275 

xi  94)  = 

124.6166687012 

y  [94]  = 

127.2861557007 

x(95j  = 

126.3000869751 

y  [95]  = 

126.4723205566 

x|96i  = 

125.5646057129 

y  [96]  = 

125.7940750122 

xi  97’  = 

125.2095794678 

y  [97]  = 

126.0042343140 

x)98i  = 

125.4065628052 

y[98]  = 

127.6092834473 

xi99l  = 

125.3776702881 

y  [99]  = 

126.1830444336 

Real 

Imag 

Mean 

125.9391876221 

126.3969137573 

Std.  Dev. 

1.3873679241 

1.8051794246 

Minimum 

121.6201248169 

120.5933532715 

Maximum  128.8446350098  L30. 9378356934 


Standard  vs.  Simulation,  10-Point 


xiOl  = 

131.6860504150 

yfo)  — 

128.5588684082 

xtll  = 

126.2768936157 

y[i!  = 

126.5050506592 

x[2]  = 

129.6031494141 

y[2]  = 

125.9955062866 

x[3]  = 

127.3920211792 

y[3]  — 

125.4859466553 

x[4]  = 

130.0837402344 

y  [4]  = 

128.9797515869 

x[5]  = 

129.1377258301 

y(5]  = 

128.2185974121 

x[6]  = 

127.2726821899 

y[0]  = 

127.4755172729 

x[7)  = 

126.6516876221 

y[7]  = 

127.4683837891 

x[8]  = 

128.4108276367 

y[s]  = 

125.9150314331 

x(9]  = 

128.5280303955 

y  [9]  = 

127.2252960205 

xjlOj  = 

125.5002517700 

y  (io]  = 

133.6931915283 

x[ll]  = 

127.8747558594 

y(n]  = 

127.8566436768 

x(l2]  = 

130.7615966797 

y[12j  = 

128.2600708008 

x(l3|  = 

124.7099914551 

y[i3]  = 

126.8356246948 

xl’14]  = 

128.4129943848 

y[l4]  = 

126.2015075684 

x(l5]  = 

126.9171371460 

y(l5]  = 

128.9027557373 

x[l6j  = 

131.8934173584 

y(i6]  - 

125.0395584106 

xil7j  = 

127.7030563354 

y[17j  = 

128.3233032227 

xil8]  = 

128.4555358887 

y [is]  = 

127.7306365967 

x[19l  = 

125.1961898804 

y[i9]  = 

128.1782684326 

x(20]  = 

127.3693542480 

y  [20]  = 

131.4153137207 

x  1 2 1 1  = 

126.4642181396 

y  [21]  - 

126.6883087158 

xi'22]  = 

129.0205841064 

y  [22]  = 

126.4337387085 

xi  231  = 

124.9503555298 

y[23]  == 

132.9751434326 

x  1 241  = 

125.8017578125 

y  [24]  = 

127.3063812256 

x:25j  = 

126.2557220459 

y  [25]  = 

130.3205108643 

x 

to 

CB 

II 

124.4999465942 

y  [26]  = 

127.4365386963 

x  •27]  = 

125.2276916504 

y [27]  = 

129.3834533691 

x;28l  = 

126.0257110596 

y  [28]  = 

130.0014343262 

xi29|  - 

125.9543762207 

y  [29]  = 

127.5649948120 

II 

o' 

e-? 

X 

126.1363525391 

y  [so]  = 

128.7181854248 

x  ,3 1  i  = 

123.9893798828 

y  [3i  ]  = 

129.1547698975 

x  32!  = 

127.2222442627 

y[32]  = 

127.8531951904 

xi33l  — 

123.7135543823 

y  [33]  = 

130.2106018066 

x  1 34  i  = 

126.2118682861 

y(34]  = 

128.7924041748 

x:35i  = 

128.4245300293 

y[35]  = 

129.1282501221 

xi36i  = 

125.4156341553 

y[36i  = 

127  9233169556 

x:37l  = 

128.2499237061 

y  [37  ]  = 

127.1410751343 

x:38l  - 

128.5164947510 

y  1 38]  = 

125.4722213745 

x  39j  = 

127.4740142822 

yi39l  = 

128.3810577393 

Standard  vs.  Simulation,  10-Point 


Xi40i  = 

128.7833862305 

y  (401  = 

127.5596847534 

xi4ll  = 

127.2169799805 

y  [41]  = 

129.6949310303 

x(42]  = 

126.2682288235 

yi'42]  = 

128.2455596924 

xi43l  = 

126.7479629517 

y|43|  = 

127.9707107544 

xj44j  = 

129.6607360840 

y(44j  = 

127.3498916626 

x[45j  = 

130.4431762695 

y[45j  = 

127.9753112793 

x[46]  = 

119.8516998291 

y  [46]  = 

129.3677520752 

x[47]  = 

128.3192596436 

y[47]  = 

128.5295715332 

x[48l  = 

125.9970321655 

y  [48]  = 

128.0255126953 

x(49]  = 

127.9612731934 

y  [49]  = 

128.1005249023 

II 

s 

* 

128.5718841553 

y[50]  = 

129.4289245605 

x  1 5 1  ]  = 

127.5533065796 

y[si]  = 

128.1962890625 

x[52]  = 

127.1122436523 

y[52]  = 

125.2886352539 

X 

04 

Co 

II 

124.5508193970 

y[53]  = 

128.9994506836 

x(54i  = 

124.2272644043 

y(54j  = 

128.7111511230 

x[55]  = 

125.7153244019 

y  [55]  = 

126.2460174561 

xi56l  = 

126.5410690308 

y[56]  = 

130.8160705566 

xi57'  = 

127.0897369385 

y [57]  = 

126.3504333496 

xi  581  = 

125.0207595825 

y[58j  = 

128.0140686035 

x  1 591  = 

129.4002075195 

y[59]  = 

128.8536682129 

xi60i  = 

125.3560104370 

y(60]  = 

127.4634628296 

x  6 1  j  = 

131.5325012207 

y(«i|  = 

126.8283767700 

x:62]  = 

128.2161865234 

y  [62]  = 

127.7228469849 

x  63!  = 

128.2512817383 

y  [63]  = 

123.9412612915 

x;64j  = 

124.1138458252 

y [64]  = 

132.8595428467 

x;65i  = 

128.7828826904 

y  [65]  = 

128.0684661865 

Xi66i  = 

132.4055175781 

y(66]  = 

124.2985458374 

X 

Os' 

■^1 

II 

128.9963684082 

y  [67]  = 

127.7581024170 

xi68j  = 

127.6422271729 

y  (681  = 

130.9707946777 

x  691  = 

127.7262344360 

II 

cT 

CO 

>. 

127.8050079346 

x  ;oi  = 

127.9575729370 

y  (70i  = 

126.2689437866 

x  :i  = 

127.3208770752 

y 1 7 1 1  = 

127  8360519409 

x  72  = 

128.9340209961 

y (72!  = 

126.6705780029 

x  73  = 

127.7839202881 

v  |73j  = 

125.0569152832 

x  74  = 

126.7753448486 

y;74l  = 

126.6033554077 

x  75  = 

127.6877136230 

y  1 75  i  = 

126.3947296143 

x  76:  = 

126.5776748657 

v  76i  = 

127.2083358765 

x  77  = 

128  0478057861 

y;77'  == 

126.9995956421 

x  78-  = 

130.2034301758 

y  78'  = 

128.5143127441 

x  79  = 

126.2883758545 

v  79!  = 

125.8590210479 

203 


xiSOj  = 
xi8l]  = 
x;82l  = 
x[83j  = 
xi  84]  = 

x  [85]  = 
x[86]  = 
x[8'  == 
x[88j  = 
xi89]  = 
xi90i  = 
x[9ll  = 
x[92]  = 
x[93j  = 
x[94]  = 
x[95]  == 
x|’96l  = 

xi  97]  = 
x[98l  = 
xi99l  = 

Mean 
Std.  Dev. 
Minimum 
Maximum 


Standard  vi 

129.0808410645 

y[80i  = 

131.4768829346 

y  [8i]  = 

128.4235992432 

y  [82)  = 

129.9603576660 

y[83j  = 

129.2503814697 

y[84]  = 

126.9571685791 

y(85]  = 

126.4190673828 

y[86]  = 

127.9010543823 

y  [87]  = 

128.4527130127 

y  [88]  = 

126.0189590454 

y[89]  = 

127.2768098924 

y  [90]  = 

130.0948028564 

y[9i]  = 

125.2066574097 

y  [92]  = 

132.5957794189 

y(93]  = 

127.9331283569 

y  [94]  = 

130.7555236816 

y  [95]  = 

128.2622222900 

y[96]  = 

127.0912322998 

y  [97]  = 

126.6822128296 

y[98j  = 

129.8872528076 

y  [99]  = 

Real 

127.5675023651 

2.0868934883 


.  Simulation,  16-Point 

124.5049972534 

125.0545959473 

126.4251098633 

125.8778915405 

130.3973236084 

125.9371643066 

128.4761199951 

124.4585342407 

127.4746475220 

129.0239257813 

125.2287292480 

119.5501251221 

128.3637237549 

127.0037918091 

126.1179885864 

126.5633239746 

128.1271057129 

125.1954727173 

128.4990844727 

126.6480636597 

Imag 

127.6293053436 

1.9825243937 


119.8516998291 

132.5957794189 


119.5501251221 

133.6931915283 


Standard  vs.  Simulation,  17-Point 


121.4185028076 

y[o)  -= 

123.7153015137 

120.8895874023 

y[i]  = 

120.1140975952 

122.3241653442 

y(2]  = 

118.8880996704 

122.3515472412 

y[3|  = 

123.6122055054 

121.8156280518 

y[4]  = 

120.1766967773 

120.1210479736 

y  [s]  = 

119.1418151855 

122.2692947388 

y  [e]  = 

119.3881454468 

119.8981628418 

y[7]  = 

120.5820312500 

120.7996978760 

y[8]  = 

120.2281799316 

119.6428146362 

y[9)  = 

121.8146896362 

121.3981552124 

y (lo]  = 

120.2237091064 

121.5536193848 

y  [ii]  = 

120.7965164185 

119.6506652832 

y  [12]  = 

119.2724609375 

119.6428375244 

y  [is]  = 

120.6252441406 

124.6930618286 

y  [i4]  = 

118.6544799805 

120.1872253418 

y  [is]  = 

120.4623184204 

123.8273925781 

y  [is]  = 

115.8238830566 

122.0793380737 

y [i7]  = 

120.7008056641 

117.1603088379 

y [is]  = 

122.6836547852 

119.4157943726 

y(i9]  = 

120.7734298706 

119.6562957764 

y  [20]  - 

121.8309707642 

119.9624023438 

y(2i]  = 

119.7293319702 

119.4375228882 

y  [22]  = 

122.8455047607 

120.8168640137 

y  [23]  = 

121.1245498657 

121.7880325317 

y  [24]  = 

120.8520278931 

121.4904479980 

y  [25 »  = 

121.2198104858 

120.1587295532 

II 

”cO 

CL 

122.2774047852 

119.0045394897 

y  [27 1  = 

119.2919616699 

117.8995742798 

y  [28]  = 

122.7669525146 

120.6625213623 

II 

C5 

cl 

120.0256042480 

120.9794235229 

y(30i  = 

120.6208343506 

120.2607574463 

>' [31 !  = 

120.6785888672 

122.3692016602 

vi  32!  = 

118.1977386475 

122.3941574097 

V(33l  = 

120.3936767578 

121.3694000244 

y  [341  = 

121.8156204224 

121.0974197388 

y>35i  - 

121.8064270020 

122.0919265747 

y 

CO 

03 

II 

120.8452835083 

123.8083877563 

II 

N 

CO. 

X 

122.1518173218 

123.2192001343 

v38i  = 

121.1457931519 

118  2859954834 

v  391  = 

122.6652297974 

Standard  vs.  Simulation,  17-Point 


xi  40)  = 

123.1239242554 

y[40)  = 

117.5355682373 

V*V* 

xi41i  = 

120.5650024414 

y[4l)  = 

121.1293640137 

-Y-Y 

x(42]  = 
x[43)  = 

120.1896514893 

123.9672164917 

y[42)  = 
y[43)  = 

124.0279541016 

120.1972045898 

x  [44]  = 

121.1884918213 

y  [44)  = 

119.4192123413 

x[45l  = 

121.6450805664 

y  (45)  = 

121.3957824707 

M  *  *  *  K  1 

x[46l  = 

121.0041503906 

y{46]  = 

121.1322631836 

>< 

'rfk. 

s 

II 

120.9823379517 

y(47j  = 

120.6914672852 

< 

x[48]  = 

121.4914703369 

y  (48)  = 

121.1651840210 

x)49j  = 

119.3569335938 

y[49j  = 

118.9468841553 

x[50]  = 

120.3198394775 

y  (so)  = 

118.2699966431 

x(5lj  = 

118.8627700806 

y  [si]  = 

121.3817672729 

.*•  •  .“t 

x  [52]  = 

122.6042404175 

y  [52]  = 

120.2716064453 

x[53)  = 

121.0762939453 

y  [53)  = 

123.8433456421 

x(54j  = 

119.2531814575 

y[54)  = 

122.4837036133 

xi55j  = 

122.8533477783 

y[55j  = 

120.4808044434 

xi56l  = 

116.8894882202 

y  [56]  = 

118.2416763306 

V- 

— i 

xi  57!  = 

119.6834869385 

y[57)  = 

122.6411972046 

x;  58)  = 

118.9494781494 

y[58)  = 

124.0523681641 

Xi59l  = 

120.2644271851 

y(59]  = 

118.7395095825 

'»•  ’ j 

xi60i  = 

119.9867706299 

y  [60]  = 

120.0951919556 

Y'-V.N 

Xi61)  = 

120.6990814209 

y  [6i|  = 

120.8575286865 

r  _« — J 

xi  62)  = 

119.1787338257 

y  [62]  - 

119.3044891357 

xi63j  = 

121.7292327881 

y[63)  = 

122.4385299683 

x[64i  = 

121.7205047607 

y  [64]  = 

120.2123947144 

Y’Y" 

x,65i  = 

123.3698272705 

y [65]  = 

119.6864852905 

-.  .i 

x  1 66)  = 

123.7213287354 

y[66j  = 

120.7648468018 

x;67  = 

123.2931976318 

y  (67)  = 

119.0507125854 

x  681  = 

121.8409805298 

yi68l  = 

118.7207031250 

x  691  = 

117.1487274170 

y|69l  = 

121.1708145142 

'  / 

x  70i  = 

117.6536636353 

y  [70)  = 

125.0488510132 

r**'  *  — 

x  71  = 

120.4309310913 

y  [7i]  = 

120.7217025757 

x  72’  = 

120.3559417725 

y]72|  = 

123  0536117554 

x  73  = 

118.4559555054 

y  [73]  = 

122.7774734497 

x  74i  = 

120.2510299683 

y[74)  = 

121.5375595093 

x  75  = 

119.4515686035 

y  [75]  = 

122.0602645874 

x  76  = 

121.1188812256 

y  [76]  = 

118.9356307983 

x  77  = 

122.2433090210 

y  1 77 )  = 

121.0288772583 

•  ""*.**-  ' 

x  78'  = 

121.0973434448 

y  >78]  = 

122  0551071167 

,  •*,  •  .  « 

x  79  = 

122.3748474121 

y'791  = 

120.4308776855 

• 
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Standard  vs.  Simulation,  17-Point 


x  SO  = 

117.8436126709 

y|’80j  = 

120.8064880371 

x.8lj  = 

119.2608795166 

y{8ij  = 

121.7909698486 

xj'82]  = 

125.2556686401 

y  [82]  = 

122.3520889282 

xi83]  = 

120.3285446167 

y[83]  = 

121.0553054810 

x  [84]  — 

118.6132125854 

y(84|  = 

121.0168457031 

x[85j  = 

121.4505996704 

y[85]  = 

113.7971115112 

x[86j  = 

121.8219985962 

y  [86]  = 

121.8599853516 

x[87]  = 

122.8372497559 

y(87]  = 

117.4337539673 

00 

00 

II 

122.3030853271 

y  [88]  = 

114.8612136841 

x(89]  = 

121.3166961670 

y  [89]  = 

120.8127822876 

xi  901  = 

123.1911087036 

y  [90]  = 

119.7116546631 

xi9l]  == 

121.0095520020 

y[9l]  = 

118.8702926636 

xi92j  = 

117.8431091309 

y  [02]  = 

121.0273895264 

xi  93)  = 

117.5745162964 

y  [93]  = 

121.1828002930 

x[94l  = 

122.9829483032 

y  [04]  = 

118.9766159058 

x[95l  = 

121.1741943359 

y  [95]  = 

121.1646194458 

x  1 96l  = 

115.4362182617 

y  [06]  = 

122.9944000244 

xi  97'  = 

122.2885818481 

y[07]  = 

121.9745864868 

xi  98!  = 

117.1472167969 

y  [98]  = 

123.5706710815 

x,99i  = 

120.5247573853 

y  [99]  = 

119.1383819580 

Real 

Imag 

Mean 

120.7848806763 

120.7065936279 

Std.  Dev. 

1.8456750456 

1.8482199671 

Minimum 

115.4362182617 

113.7971115112 

Maximum 

125.2556686401 

125.0488510132 
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ABSTRACT 


This  research  examines  a  very  large-scale  integrated 
(VLSI)  circuit  implementation  of  the  Winograd  and  Good-Thomas 
algorithms  for  computing  discrete  Fourier  Transforms  (DFTs). 
with  composite  blocklengths .  The  theoretical  background  for 
calculating  DFTs  in  general  is  developed,  before  the  algorithms 
of  interest  are  presented  in  detail.  Once  the  validity  of  the 
algorithms  is  established,  a  VLSI  architecture,  which  exploits 
the  parallelism  and  pipelining  inherent  in  the  algorithms,  is 
discussed.  Winograd  processors  use  either  the  small  or  the 
large  Winograd  DFT  algorithm  to  compute  DFTs  with  blocklengths 
of  15,  16,  and  17.  Longer  blocklength  DFTs  (240,  255,  272, 
and  4080)  are  computed  using  a  pipeline  of  Winograd  processors, 
dual-port  memories,  and  an  interface  processor;  the  pipeline 
uses  the  Good-Thomas  Prime  Factor  Algorithm  (PFA),  Fault  tol¬ 
erance  was  included  in  the  initial  design  of  the  VLSI  architec¬ 
ture.  Watchdog  processors  check  both  data  and  addresses  of  ac¬ 
tive  Winograd  processors,  while  parity  checking  circuits  incor¬ 
porated  in  the  Winograd  processors  augment  data  memory  error- 
correction  coding  (ECC). 

The  numerical  accuracy  of  the  VLSI  circuit  was  determined 
using  a  software  simulation.  The  signal-to-noise  ratio(SNR) 
was  used  as  the  accuracy  metric.  The  signal  was  the  output  of 
a  standard  module,  which  used  double-precision  arithmetic,  while 
the  noise  was  the  difference  between  the  standard  and  simulation 
modules.  The  simulation  module  used  integer  arithmetic  to  exactly 
mimic  operation  of  the  VLSI  circuit.  The  outputs  of  the  standard 
module  were  also  compared  with  a  direct  evaluation  of  the  DFT 
to  verify  that  the  standard  module  did  compute  a  DFT.  Results 
of  the  comparison  between  the  standard  and  simulation  modules 
for  single-factor  DFTs  (i.e.,  15,  16,  and  17)  indicate  the  VLSI 
circuit  can  produce  results  accurate  enough  for  synthetic  aper¬ 
ture  radar  and  other  demanding  applications. 


■y.o 


UNCLASSIFIED 


SECURITY  CLASSIFICATION  OF  THIS  FAOC 


.*  V.V.V.V 


v.v’/V  V 


